{"id":5272,"date":"2021-01-05T15:24:27","date_gmt":"2021-01-05T09:54:27","guid":{"rendered":"https:\/\/opstree.com\/blog\/\/?p=5272"},"modified":"2021-01-12T15:39:18","modified_gmt":"2021-01-12T10:09:18","slug":"elasticsearch-backup-and-restore-in-production","status":"publish","type":"post","link":"https:\/\/opstree.com\/blog\/2021\/01\/05\/elasticsearch-backup-and-restore-in-production\/","title":{"rendered":"Elasticsearch Backup and Restore in Production"},"content":{"rendered":"\r\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" class=\"wp-image-5297\" src=\"https:\/\/opstree.com\/blog\/\/wp-content\/uploads\/2021\/01\/elasticsearch-backup-and-restore-in-production.png\" alt=\"\" \/>\r\n<figcaption>ES backup and restore using AWS S3<br \/><br \/><\/figcaption>\r\n<\/figure>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">We were\u00a0fortunate\u00a0enough to get an opportunity to do an\u00a0Elasticsearch\u00a0cluster\u00a0snapshot and restore on a production highly active cluster. The indices\u00a0we needed to restore were around 2 \u2013\u00a03 TB in size.<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">Our task was to take a snapshot from an old cluster (v 6.4.2) which had several huge indices and restore a few of them to a new cluster (v7.9.2). This endeavour was supposed to bring the load down from the old cluster.\u00a0\u00a0<\/p>\r\n<p><!--more--><\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">The old cluster was facing a lot of performance issues. Read\/Write operations were too much to handle. Also, CPU and memory utilization were high most of the time. Segment merging was a lot slower than expected.\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">For this\u00a0reason,\u00a0it became necessary to move some of the indices to a new cluster. We were anticipating that this activity will bring\u00a0speed and\u00a0stability in the performance of the application using the clusters.\u00a0Before starting the activity, naturally, we scoured the internet for\u00a0everything related to Elasticsearch backup and restore.<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">While searching on the internet, we found a lot of blogs, videos, and documents which we went through word by word. The research was helpful but there are things that can be learned only by doing. Therefore, we decided to write a blog based on our own experiences and add to the already existing pool of resources on the topic.<\/p>\r\n\r\n\r\n\r\n<p>There are three ways, according to documentation, through which we could have migrated our indices:<\/p>\r\n\r\n\r\n\r\n<ol>\r\n<li>Index\u00a0<\/li>\r\n<li>Reindex<\/li>\r\n<li>Snapshot and restore<\/li>\r\n<\/ol>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">This called for an elimination meeting where we decided which way to go for. The first option is the simplest but also not quite efficient as it involves using another tool just for log harvesting. The second option was also not attractive because reindexing could be a quite resource-intensive process that would pose a risk which we were not in the position to afford. Consequently, we went with the third one, Snapshot and restore. Snapshot and restore, in Elasticsearch, is divided into three different processes. These are,<\/p>\r\n\r\n\r\n\r\n<ol>\r\n<li>Register a snapshot repository<\/li>\r\n<li>Create your first snapshot and\u00a0subsequent\u00a0incremental snapshots<\/li>\r\n<li>Restore\u00a0(or incremental restore)\u00a0to new location<\/li>\r\n<\/ol>\r\n\r\n\r\n\r\n<p class=\"has-medium-font-size\">REGISTER A SNAPSHOT REPOSITORY<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">A snapshot repository, as the name suggests, is a location that stores all our indices and related metadata. It could be anything from a local filesystem to remote cloud object storage. There are multiple options available like fs, URL, s3, etc as stated in the <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/snapshots-register-repository.html\" target=\"_blank\" rel=\"noreferrer noopener\">official docs<\/a>\u00a0to create a repository.<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">We went with S3 because it was a convenient option for us.\u00a0Let us explore S3 further. Using S3 as a repository is quite simple. We need to install an\u00a0Elasticsearch\u00a0plugin called repository-s3\u00a0<strong>in each node<\/strong> as it is a node-level setting and then use the _snapshot API to register the repository in a bucket.<\/p>\r\n\r\n\r\n\r\n<p>Let&#8217;s install the plugin:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>cd ~ <\/code><\/pre>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>wget https:\/\/artifacts.elastic.co\/downloads\/elasticsearch-plugins\/repository-s3\/repository-s3-&lt;version&gt;.zip \r\n<\/code><\/pre>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>sudo \/usr\/share\/elasticsearch\/bin\/elasticsearch-plugin install file:\/\/\/home\/&lt;user&gt;\/repository-s3-&lt;version&gt;.zip<\/code><\/pre>\r\n\r\n\r\n\r\n<p>We can\u00a0confirm\u00a0that the plugin is installed with the below command:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>sudo \/usr\/share\/elasticsearch\/bin\/elasticsearch-plugin list<\/code><\/pre>\r\n\r\n\r\n\r\n<p>It\u2019s time for\u00a0the last\u00a0pair\u00a0of\u00a0command that we need to execute\u00a0on all nodes:\u00a0<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>sudo \/usr\/share\/elasticsearch\/bin\/elasticsearch-keystore add s3.client.default.access_key <\/code><\/pre>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>sudo \/usr\/share\/elasticsearch\/bin\/elasticsearch-keystore add s3.client.default.secret_key <\/code><\/pre>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\"><strong>Note<\/strong>: \u2018default\u2019 in s3.client.default.access_key is the name of the repository being registered. It is denoted by the setting \u201cclient\u201d in the API request. We can give any name we want which we will see further in this blog.\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">Here, we\u2019ll have to enter our AWS ACCESS_KEY and SECRET_KEY created via IAM for S3 access. Required permission for the IAM role can be\u00a0found in the <a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/plugins\/current\/repository-s3-repository.html#repository-s3-permissions\" target=\"_blank\" rel=\"noreferrer noopener\">official docs<\/a>.\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">Now, that\u00a0we have all the prerequisites settled, let\u2019s register our repository.\u00a0For this, we need to make a curl request\u00a0to\u00a0Elasticsearch\u00a0_snapshot API as shown below:\u00a0<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>curl -X PUT \"&lt;hostname\/IP&gt;:9200\/_snapshot\/repo_name\" -H 'Content-Type: application\/json' -d' \r\n{ \r\n\"type\": \"s3\", \r\n\"settings\": { \r\n    \"bucket\": \"my-S3-bucket\", \r\n    \"region\": \"ap-south-1\", \r\n    \"base_path\": \"path\/to\/respective\/directory\/\" \r\n  } \r\n } \r\n' <\/code><\/pre>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">If we need to add more repositories, we can add them by changing their \u201cclient\u201d name and respective secrets in\u00a0elasticsearch-keystore. Keystore settings are secure and reloadable, so we can add\/update them without restarting the service or cluster.<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>curl -X PUT \"&lt;hostname\/IP&gt;:9200\/_snapshot\/repo_name\" -H 'Content-Type: application\/json' -d' \r\n{ \r\n\"type\": \"s3\", \r\n\"settings\": {\r\n    \u201cclient\u201d: \u201cnew-repo\u201d, \r\n    \"bucket\": \"my-S3-bucket\", \r\n    \"region\": \"ap-south-1\", \r\n    \"base_path\": \"path\/to\/respective\/directory\/\" \r\n  } \r\n } \r\n' <\/code><\/pre>\r\n\r\n\r\n\r\n<p>The above steps need to be done on both the source cluster and the destination cluster to register the snapshot repository. To view our registered repositories, we can use the below API request,<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>curl -XGET &lt;hostname\/IP&gt;:9200\/_cat\/repositories<\/code><\/pre>\r\n\r\n\r\n\r\n<p class=\"has-medium-font-size\">CREATE SNAPSHOTS<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">Having\u00a0registered\u00a0the repository, we can proceed with taking our incremental snapshots,<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>curl -XPUT \"&lt;hostname\/IP&gt;:9200\/_snapshot\/repo_name\/my_snapshot_2020-12-30?wait_for_completion=true&amp;pretty\" -H 'Content-Type: application\/json' -d' \r\n{ \r\n    \"indices\": \"comma,seperated,indices\", \r\n    \"ignore_unavailable\": true, \r\n    \"include_global_state\": false \r\n} \r\n' <\/code><\/pre>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">To\u00a0view\u00a0all the snapshots of a repository, we\u00a0can\u00a0use the below API request,<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code> curl -XGET &lt;hostname\/IP&gt;:9200\/_cat\/snapshots\/repo_name<\/code><\/pre>\r\n\r\n\r\n\r\n<p class=\"has-medium-font-size\">RESTORE SNAPSHOTS\u00a0<\/p>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">We can view our snapshots in the destination cluster with the same request as above since both clusters have their repositories at the same location. Now that our incremental snapshots have been taken, it\u2019s time to restore them to the new cluster\/or location. The first restore is quite simple. We need to make a POST request like below,<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>curl -X POST \"&lt;hostname\/IP&gt;:9200\/_snapshot\/repo_name\/my_snapshot_2020-12-30\/_restore?pretty\" -H 'Content-Type: application\/json' -d'\r\n{\r\n    \"indices\": \"comma,seperated,indices\",\r\n    \"ignore_unavailable\": true,\r\n    \"include_global_state\": false,\r\n    \"index_settings\": {\r\n        \"index.number_of_replicas\": 1\r\n    }\r\n}\r\n'<\/code><\/pre>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">For incremental restores, we need to ensure that our indices are not open to avoid data conflict or\u00a0inconsistency. Elasticsearch ensures\u00a0safety\u00a0here\u00a0and does not allow restore operations on open indices. Therefore, to restore on \u00a0pre-existing indices, we need to close them first,<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>curl -X POST &lt;hostname\/IP&gt;:9200\/comma,serperated,indices\/_close<\/code><\/pre>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">Following which we can restore on\u00a0these indices,<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>curl -X POST \"&lt;hostname\/IP&gt;:9200\/_snapshot\/repo_name\/my_snapshot_2020-12-30\/_restore?pretty\" -H 'Content-Type: application\/json' -d'\r\n{\r\n    \"indices\": \"comma,seperated,indices\",\r\n    \"ignore_unavailable\": true,\r\n    \"include_global_state\": false\r\n}\r\n'<\/code><\/pre>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">Don&#8217;t worry about opening the indices again. _restore API will open closed indices after successful incremental\u00a0restore.<br \/>For detailed information on all the mentioned Elasticsearch API settings and other settings, we&#8217;ll link their official documentation here.<\/p>\r\n\r\n\r\n\r\n<ul>\r\n<li><a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/elasticsearch-keystore.html\" target=\"_blank\" rel=\"noreferrer noopener\">ES keystore<\/a><\/li>\r\n<li><a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/cluster-nodes-reload-secure-settings.html\" target=\"_blank\" rel=\"noreferrer noopener\">ES reloadable settings<\/a><\/li>\r\n<li><a href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/snapshot-restore.html\" target=\"_blank\" rel=\"noopener\">Snapshot and Restore<\/a><\/li>\r\n<\/ul>\r\n\r\n\r\n\r\n<p class=\"has-text-align-justify\">After we had our snapshot and restore all figured out, all that was left was to keep taking incremental snapshots until a planned migration time. Then, switch over the traffic to a new cluster during the activity post latest incremental restore. There was a little issue in handling data that is being generated during the activity but that was taken care of with the help of Kafka. Maybe we&#8217;ll write a new blog to talk about it in detail.<\/p>\r\n\r\n\r\n\r\n<p><strong>Co-author<\/strong>: <a href=\"https:\/\/opstree.com\/blog\/\/author\/adeel109\/\" target=\"_blank\" rel=\"noreferrer noopener\">Adeel Ahmad<\/a><\/p>\r\n\r\n\r\n\r\n<p>Opstree is an End to End DevOps solution provider<\/p>\r\n\r\n\r\n\r\n<p><a href=\"https:\/\/www.opstree.com\/contact-us\" target=\"_blank\" rel=\"noreferrer noopener\">CONTACT US<\/a><\/p>\r\n","protected":false},"excerpt":{"rendered":"<p>We were\u00a0fortunate\u00a0enough to get an opportunity to do an\u00a0Elasticsearch\u00a0cluster\u00a0snapshot and restore on a production highly active cluster. The indices\u00a0we needed to restore were around 2 \u2013\u00a03 TB in size. Our task was to take a snapshot from an old cluster (v 6.4.2) which had several huge indices and restore a few of them to a &hellip; <a href=\"https:\/\/opstree.com\/blog\/2021\/01\/05\/elasticsearch-backup-and-restore-in-production\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Elasticsearch Backup and Restore in Production&#8221;<\/span><\/a><\/p>\n","protected":false},"author":199325085,"featured_media":29900,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[28070474],"tags":[768739294,2513561,768739310,768739311],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/opstree.com\/blog\/wp-content\/uploads\/2025\/11\/DevSecOps-1.jpg","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pfDBOm-1n2","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/5272"}],"collection":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/users\/199325085"}],"replies":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/comments?post=5272"}],"version-history":[{"count":25,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/5272\/revisions"}],"predecessor-version":[{"id":5364,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/posts\/5272\/revisions\/5364"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media\/29900"}],"wp:attachment":[{"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/media?parent=5272"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/categories?post=5272"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opstree.com\/blog\/wp-json\/wp\/v2\/tags?post=5272"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}