Elasticsearch in Action: Creating and Restoring Snapshots (2/3)

4 min readDec 18, 2022

Excerpts taken from my upcoming book: Elasticsearch in Action

The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.

Mini-series of Snapshotting Feature

Creating snapshots

Now that we’ve gone through the process of registering a snapshot repository, the next step is to create snapshots so the data gets backed up to the repository that we’ve just created. There are a couple of ways in which we can create a snapshot. Let’s start with a simple manual way of creating the snapshots. For that, we’ll use the _snapshotAPI. The following listing provides the code to do just that.

Listing : Creating a snapshot manually

PUT _snapshot/es_cluster_snapshot_repository/prod_snapshot_oct22

In the listing, we ask the _snapshotAPI to create a snapshot named prod_snapshot_oct22 under the repository es_cluster_snapshot_repository. This is a one-off manual snapshot that backs up all the data (indices, data streams, and cluster information) to the said snapshot on the disk in the repository’s filesystem.

We can also take a custom snapshot of a few indices if we want to, rather than taking the whole lot of it as we just did. We can attach a request body to the above listing to consider a set of indices; say, all movies and all reviews, for example. The code in the following listing does exactly that.

Listing : Creating snapshots with specific indices

PUT _snapshot/es_cluster_snapshot_repository/custom_prod_snapshots
{
  "indices": ["*movies*","*reviews*"] #A
}

The indicesattribute considers a string or an array of strings, which represent the set of specific indices that we want to backup. In our example, we back up any index with a glob pattern of *movies* and *reviews*. By default, all the indices and data streams are included ([*]) if we don’t specify what we want to backup. If we want to omit a few, we can use the pattern with a minus sign (or dash) like this: -*.old. This pattern, in our case, omits all indices ending with .old.

You can also attach a set of user-defined properties in a metadataattribute. Say, for example, we want to note the incident details of a user’s request when taking the snapshot. The following listing shows this as a query.

Listing : Adding custom details to the snapshot

PUT _snapshot/es_cluster_snapshot_repository/prod_snapshots_with_metadata
{
  "indices": ["*movies*","*reviews*", "-*.old"], #A
  "metadata":{ #B
    "reason":"user request",
    "incident_id":"ID12345",
    "user":"mkonda"
  }
}

As you can see, we’ve enhanced the list of indices by removing “old” indices as part of the snapshot processes. We’ve also added the metadata with the user request information, but you can create as many details as possible in this object. The final step in the life cycle of Elasticsearch’s snapshot and restore functionality is to restore the snapshots, which we discuss in the next article.

Restoring snapshots

Restoring the snapshot is relatively straightforward. All we need to do is to invoke _restoreon the _snapshotAPIas this query demonstrates.

Listing : Restoring data from a snapshot

PUT _snapshot/es_cluster_snapshot_repository/custom_prod_snapshots/_restore

The _restoreendpoint copies the data from the repository to the cluster. Of course, we can attach a JSON object to specify further details of which indices or data streams we want to restore. The following query provides an example of such a request.

Listing : Restoring a few indices from a snapshot

POST _snapshot/es_cluster_snapshot_repository/custom_prod_snapshots/_restore
{
  "indices":["new_movies"] #A
}

Deleting snapshots

We don’t need to keep snapshots on disk all the time. One strategy most organizations follow is to create snapshots for individual indices as per users’ requests. Sometimes, we may need to update our mapping or change the primary shards of a given index. Unfortunately, we can’t do this as long as the index is in a live state.

The best approach is to create a new index with the appropriate shards and mapping and then take a snapshot of the current index, restore that to the newly created index from the snapshot, and delete the snapshot. The figure below demonstrates this activity pictorially.

**Figure : Snapshot lifecycle — from creation to deletion**

As you can see in the figure, we can migrate data from an old index to a new index with the snapshot and restore functionality. Once we use the snapshots, we can then delete them to free up storage space.

Deleting the snapshots is fairly straightforward: we use the HTTP DELETEaction, providing the snapshot ID. The example in the following listing shows how to delete the snapshot that we created earlier.

Listing : Deleting a snapshot

DELETE _snapshot/es_cluster_snapshot_repository/custom_prod_snapshots

If you issue the HTTP DELETEaction command while the snapshot is in progress, Elasticsearch instantly halts the activity, deletes the snapshot, and removes the contents from the repository.

In the next article, we will look at automating the snapshotting functionality