Elasticsearch in Action: Rolling Over Indices
Excerpts taken from my upcoming book: Elasticsearch in Action
The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.
Rolling over an index alias
Our indices accumulate data over time. Yes, we can split the index to handle additional data. However, splitting simply re-adjusts the data into additional shards. Elasticsearch provides another mechanism, called rollover, where the current index is rolled over to a new blank index.
Unlike a splitting operation, in a rollover, the documents are not copied to the new index. The old index becomes read-only, and any new documents will be indexed into this rolled over index going forward. For example, if we have an index app-000001
, rolling over creates a new index app-000002
. If we rollover once again, another new index, app-000003
, is instantiated and so on.
The rollover operation is heavily used when dealing with time-series data. The time-series data (data that’s usually generated for a specific period like every day, weekly, or monthly) is usually held in an index created for a particular time period. Application logs, for example, are created per date, like logs-18-sept-21
,logs-19-sept-21
, and so on.
This will be easy to understand when you see it in action. Let’s say we have an index for cars: cars_2021–000001
. Elasticsearch performs a few steps to rollover the cars_2021–000001
index. We’ll go over these steps in the next couple of sections.
Elasticsearch creates an alias pointing to the index (cars_2021–000001
, for example). Before Elasticsearch creates this alias, we must make sure the index is writable by setting is_write_index
to true
. The idea behind this step is that the alias must have at least one writable backing index.
Elasticsearch invokes a rollover command on the alias using the _roller
API. This creates a new rollover index (for example, cars_2021–000002
).
Note: The trailing suffix (like 000001
) is a positive number, something that Elasticsearch expects the index to be created with. Elasticsearch can increment only from a positive number; it doesn’t matter what is the starting number. As long as we have a positive integer, Elasticsearch will increment the number and move forward. If we provide my-index-04
or my-index-0004
for example, the next rollover index will be my-index-000005
. Elasticsearch automatically pads the suffix with zeros.
Creating an alias for rollover operations
The first thing we need to do before a rollover operation is to create an alias pointing to the index we want to roll over. We can use the rollover API for index or data stream aliases. For example, the following listing invokes the _aliases
api to create an alias called cars_2021_a
for the index cars_2021–000001
(make sure you have this index created upfront).
POST _aliases {
"actions": [ {
"add": {
"index": "cars_2021-000001",
"alias": "latest_cars_a",
"is_write_index": true
}
} ]
}
As you can see, the _aliases
API request body expects the add action with an index and it’s alias defined. Listing 6.44 creates the alias latest_cars_a
, pointing to an existing index, cars_2021–000001
, with the POST
command.
One important point to note: the alias must point to a writable index. This is why we set is_write_index
to true
in the listing. If the alias points to multiple indices, at least one must be a writable index. The next step is to rollover the index, which is discussed next.
Issuing a rollover operation
Now that we have an alias created, the next step is to invoke the rollover API endpoint. Elasticsearch has defined a _rollover
API for this purpose. The endpoint is invoked on the alias as the code in the following listing demonstrates.
POST latest_cars_a/_rollover
As you can clearly see, the _rollover
endpoint is invoked on the alias not the index. Once the call is successful, a new index, cars_2021–000002
, is created (the *-000001
is incremented by 1). The listing below shows the response to the call:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"old_index" : "latest_cars-000001",
"new_index" : "latest_cars-000002",
"rolled_over" : true,
"dry_run" : false,
"conditions" : { }
}
As the response indicates, a new index (latest_cars-000002
) was created as a rollover index. The old index was put into a read-only mode to pave the way for indexing documents on the newly created rollover index.
Note: The rollover API is applied to the alias, albeit the index that was behind this alias is the one that gets rolled over
Behind the scenes, invoking the _rollover
call on the alias does a couple of things. In the background, this call:
- Creates a new index (
cars_2021–000002
) with the same configuration as the old one (the name prefix stays the same but the suffix after a dash gets incremented). - Remaps the alias to point to the new index that was freshly generated (
cars_2021–000002
, in this case).
Our queries are unaffected because all queries, of course, were written against an alias (not a physical index).
- Deletes the alias on the current index and repoints it to the newly created rollover index.
When we invoke a rollover command, Elasticsearch creates a set of actions (remember, the current index must have an alias pointing the index as a prerequisite):
- Makes the current index read-only (so only queries are executed)
- Creates a new index with the appropriate naming convention
- Repoints the alias to this new index
If you re-invoke the same call as shown in the earlier listing, a new index cars_2021–000003
is created, and the alias is re-assigned to this new index rather than the old cars_2021–000002
index. When you are expected to roll over the data to a new index, simply invoking the _rollover
on the alias will suffice.
Naming conventions
Let’s touch base with the naming conventions we’ve used when rolling over indices. The _rollover
API has two formats: one where we can provide an index name and another where the system will deduce it as shown here:
POST <index_alias>/_rollover
or
POST <index_alias>/_rollover/<target_index_name>
Specifying a target index name as given in the second option lets the rollover API create the index with the given parameter as the target index name. However, the first option, where we don’t provide an index name, has a special convention: <index_name>-00000N
. The number (after the dash) is always made up of 6 digits with padded zeros. If your index follows this format, rolling over creates a new index with the same prefix, but the suffix will be automatically incremented to the next number: <index_name>-00000N + 1
. The increment starts from wherever your original index number is; for example, my_cars-000034
will be incremented to my_cars-000035
.
ILM to auto-rollover
You may be wondering, when would we want to roll over the index? That is actually up to us. When we think the index is clogged or we need to (re)move older data, we can simply invoke the rollover. However, let’s first ask ourselves:
- Can we automatically rollover the index when the shard’s size has crossed a certain threshold?
- Can we instantiate a new index for everyday logs?
Although we have seen the mechanism of rollover in this section, we can satisfy these questions using the relatively new index life-cycle management (ILM) feature, which is discussed at length in the an another article.