Elasticsearch in Action: Shrinking an Index
Excerpts taken from my upcoming book: Elasticsearch in Action
The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.
Shrinking an index
While splitting the indices expands the index by adding additional shards for more space, shrinking is the opposite: it reduces the number of shards. Shrinking helps consolidate all the documents spread out in various shards into fewer numbers of shards. Elasticsearch exposes _shrink API for this purpose. Let’s see it in action.
Let’s say we have an index (all_cars
) distributed among 50 shards and want to resize it to a single digit shard, say, 5 shards. Similar to what we did with a splitting operation, the first step is to make sure our all_cars
index is read-only, so we’ll set the index.blocks.write
property to true
. We can then readjust the shards to a single node. The code in the following listing demonstrates these actions as prerequisites before shrinking the index.
PUT all_cars/_settings {
"settings": {
"index.blocks.write": true,
"index.routing.allocation.require._name": "node1"
}
}
Now that the source index is all set for shrinking, we can use the shrink operation. The format goes like this: PUT <source_index>/_shrink/<target_index>
. Let’s issue the shrink command to shrink the all_cars
index as the following listing shows.
PUT all_cars/_shrink/all_cars_new2 {
"settings":{
"index.blocks.write":null,
"index.routing.allocation.require._name":null,
"index.number_of_shards":1,
"index.number_of_replicas":5
}
}
We need to understand a few things about the script in listing above. The source index was set with two properties: read-only and the allocation index node name. These settings will be carried over to the new target index if we do not reset them. Hence, in the script, we nullify these properties so the target index wouldn’t have these restrictions imposed on when it’s created. We also set the number of shards and replicas on the newly instantiated target index. And, hard links are created for the target index pointing to the source index file segments.
Note: Keep in mind that the number of shards must be smaller than (or equal to) the source index’s shards (afterall, we are shrinking the index!). And, of course, the target index’s shard number must be a factor of the source index’s shard number.
While we are here, we can also remove all replicas to the source index so the shrink operation is much more manageable. We just need to set the index.number_of_replicas
property to zero. Remember the number_of_replicas
property is a dynamic property, meaning that it can be tweaked on a live index.
There are also a bunch of actions that must be done prior to shrinking indices. The following list provides these actions:
- The source index must be switched off (made read-only) for indexing. Although not mandatory, but advised, we can turn off the replicas too before shrinking kicks in.
- The target index mustn’t be created or exists before the shrinking activity.
- All target index shards must reside on the same shard. There is a property called
index.routing.allocation.require.<node_name>
on the index that we must set with the node name to achieve this. - The target index’s shard number can only be a factor of the source index’s shard number. Our
all_cars
index with 50 shards can only be shrunk to 25, 10, 5, or 1 shard. - The target index’s node satisfies the memory requirements.
We can use shrinking operations when the shards are many in number but the data is sparsely distributed. As the name suggests, the idea is to reduce the number of shards.
While splitting or shrinking indices is a nice way to manage the indices as our data grows, creating indices on a set pattern with the help of a rollover mechanism is another way to create them. We’ll look at the rollover mechanism in another article shortly.