Elasticsearch in Action: Creating and Restoring Snapshots (2/3)
The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.
Mini-series of Snapshotting Feature
Creating snapshots
Now that we’ve gone through the process of registering a snapshot repository, the next step is to create snapshots so the data gets backed up to the repository that we’ve just created. There are a couple of ways in which we can create a snapshot. Let’s start with a simple manual way of creating the snapshots. For that, we’ll use the _snapshot
API. The following listing provides the code to do just that.
Listing : Creating a snapshot manually
PUT _snapshot/es_cluster_snapshot_repository/prod_snapshot_oct22
In the listing, we ask the _snapshot
API to create a snapshot named prod_snapshot_oct22
under the repository es_cluster_snapshot_repository
. This is a one-off manual snapshot that backs up all the data (indices, data streams, and cluster information) to the said snapshot on the disk in the repository’s filesystem.
We can also take a custom snapshot of a few indices if we want to, rather than taking the whole lot of it as we just did. We can attach a request body to the above listing to consider a set of indices; say, all movies and all reviews, for example. The code in the following listing does exactly that.
Listing : Creating snapshots with specific indices
PUT _snapshot/es_cluster_snapshot_repository/custom_prod_snapshots
{
"indices": ["*movies*","*reviews*"] #A
}
The indices
attribute considers a string or an array of strings, which represent the set of specific indices that we want to backup. In our example, we back up any index with a glob
pattern of *movies*
and *reviews*
. By default, all the indices and data streams are included ([*]
) if we don’t specify what we want to backup. If we want to omit a few, we can use the pattern with a minus sign (or dash) like this: -*.old
. This pattern, in our case, omits all indices ending with .old
.
You can also attach a set of user-defined properties in a metadata
attribute. Say, for example, we want to note the incident details of a user’s request when taking the snapshot. The following listing shows this as a query.
Listing : Adding custom details to the snapshot
PUT _snapshot/es_cluster_snapshot_repository/prod_snapshots_with_metadata
{
"indices": ["*movies*","*reviews*", "-*.old"], #A
"metadata":{ #B
"reason":"user request",
"incident_id":"ID12345",
"user":"mkonda"
}
}
As you can see, we’ve enhanced the list of indices by removing “old” indices as part of the snapshot processes. We’ve also added the metadata with the user request information, but you can create as many details as possible in this object. The final step in the life cycle of Elasticsearch’s snapshot and restore functionality is to restore the snapshots, which we discuss in the next article.
Restoring snapshots
Restoring the snapshot is relatively straightforward. All we need to do is to invoke _restore
on the _snapshot
APIas this query demonstrates.
Listing : Restoring data from a snapshot
PUT _snapshot/es_cluster_snapshot_repository/custom_prod_snapshots/_restore
The _restore
endpoint copies the data from the repository to the cluster. Of course, we can attach a JSON object to specify further details of which indices or data streams we want to restore. The following query provides an example of such a request.
Listing : Restoring a few indices from a snapshot
POST _snapshot/es_cluster_snapshot_repository/custom_prod_snapshots/_restore
{
"indices":["new_movies"] #A
}
Deleting snapshots
We don’t need to keep snapshots on disk all the time. One strategy most organizations follow is to create snapshots for individual indices as per users’ requests. Sometimes, we may need to update our mapping or change the primary shards of a given index. Unfortunately, we can’t do this as long as the index is in a live state.
The best approach is to create a new index with the appropriate shards and mapping and then take a snapshot of the current index, restore that to the newly created index from the snapshot, and delete the snapshot. The figure below demonstrates this activity pictorially.
As you can see in the figure, we can migrate data from an old index to a new index with the snapshot and restore functionality. Once we use the snapshots, we can then delete them to free up storage space.
Deleting the snapshots is fairly straightforward: we use the HTTP DELETE
action, providing the snapshot ID. The example in the following listing shows how to delete the snapshot that we created earlier.
Listing : Deleting a snapshot
DELETE _snapshot/es_cluster_snapshot_repository/custom_prod_snapshots
If you issue the HTTP DELETE
action command while the snapshot is in progress, Elasticsearch instantly halts the activity, deletes the snapshot, and removes the contents from the repository.
In the next article, we will look at automating the snapshotting functionality
Mini-series of Snapshotting Feature