Elasticsearch in Action: Splitting an Index

Madhusudhan Konda
4 min readJan 29, 2023

Excerpts taken from my upcoming book: Elasticsearch in Action

The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.

Me @ Medium || LinkedIn || Twitter || GitHub

Splitting an index

Sometimes the indices may need to be overloaded with data so that additional shards may need to be added to the index to manage memory and distribute them evenly. For example, if an index (cars) with 5 primary shards is overloaded, we can split the index into a new index with more primary shards, say 15 shards. This operation of expanding indices from a small size to a larger size is called splitting. Splitting is nothing more than creating a new index with more shards and copying the data from the old index into the new index.

Elasticsearch provides a _split API for splitting an index. There are some rules as to how many shards a new index can be created with as well as other rules, but first let’s see how we can perform splitting an index.

Let’s say our all_cars index was created with two shards and, as the data is growing exponentially, it is now overloaded. To mitigate the risk of slow queries and degrading performance, we want to create a new index with more space. For this, we can split the index into a brand new index that has more room and additional primary shards.

Before we invoke the split operation on the all_cars index, we must make sure the index is disabled for indexing business (the index is changed to a read-only index). To set the index as read-only, the code in the listing below will help us by invoking the _settings API.

PUT all_cars/_settings {
"settings":{
"index.blocks.write":"true"
}
}

Now that the prerequisite of making the index non operational is complete, we can move to the next step of splitting it by invoking the split API. This API expects the source and target indices as described in the format here:

POST <source_index>/_split/<target_index>

Now, let’s split the index into a new index (all_cars_new). The listing given below shows how:

POST all_cars/_split/all_cars_new {
"settings": {
"index.number_of_shards": 12
}
}

This request kicks off the splitting process. The splitting operation is a synchronous operation, meaning the client’s request waits for a response until the process is completed. Once the split completes, the new index (all_cars_new) will have all the data as well as additional space as more shards were added to it.

As mentioned earlier, splitting operation comes with certain rules and conditions. Let’s look at some of those in the following list:

  • The target index must not exist before this operation. This means, other than the configuration that you provide in the request object while splitting, an exact copy of source index is transferred to the target index.
  • The number of shards in the target index must be a multiple of the number of shards in the source index. If the source index has 3 primary shards, the target index can be defined with shards as multiples of 3 (that is, 3, 6, 9, . . . ).
  • The target index’s primary shards can never be less than the source primary shards. Remember, splitting allows more room for the index.
  • The target index’s node must have adequate space. Make sure the shards are allocated with the appropriate space.

During the splitting operation, all the configurations (settings, mappings, and aliases) are copied from the source index into the newly created target index. Elasticsearch then moves the source segment’s hard links to the target index. Finally, all the documents are rehashed as the documents’ home has changed.

The target index’s primary shard number must be a multiple of the number of primary shards of the source index. If you provide a non-multiple number, reset the index.number_of_shards and execute the query. For example, the following snippet resets the number of shards to 14:

POST all_cars/_split/all_cars_new {
“settings”: {
“index.number_of_shards”: 14
}
}

Unfortunately, this query will throw an exception. That’s because we violated the second rule in the previous list. Here’s the exception: “reason” : “the number of source shards [3] must be a factor of [14]

Splitting indices also helps to resize your cluster by adding additional primaries to the original number. The configurations are copied across the source to the target shards. That means, other than adding more shards, the split API can’t change any settings on the target index. If your requirement is to just increase the shards so the data is spread across the newly resurrected one, then splitting is the best way to go. Also, remember the target indices must not exist before invoking a splitting operation.

Me @ Medium || LinkedIn || Twitter || GitHub

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action

--

--

Madhusudhan Konda
Madhusudhan Konda

Written by Madhusudhan Konda

Madhusudhan Konda is a full-stack lead engineer, mentor, and conference speaker. He delivers live online training on Elasticsearch, Elastic Stack &Spring Cloud

Responses (1)