Analyzers for Indexing and Search

Madhusudhan Konda
5 min readJan 26, 2023
Elasticsearch in Action by M Konda

The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.

Me @ Medium || LinkedIn || Twitter || GitHub

Analyzers can be specified at a few levels: index, field and query level. Declaring the analyzers at index level provides an index-wide default catch-all analyzer for all text fields. However, if further customization is required on a field-level, one could enable a different analyzer at a field level too. In addition to this, we can also provide a different analyzer as opposed to the index time analyzer while searching. Let’s look at these options one by one in this section.

Analyzers for indexing

At times we may have a requirement to set different fields with different analyzers — for example, a name field could have been associated with a simple analyzer while the credit card number field with a pattern analyzer. Fortunately, Elasticsearch lets us set different analyzers on individual fields as required; Similarly, we can also set a default analyzer per index so that any fields that were not associated with a specific analyzer explicitly during the mapping process will inherit the index level analyzer. Let’s check these two mechanisms in this section.

Field level analyzer

We can specify required analyzers at a field level while creating a mapping definition of an index. For example, the code below shows how we can leverage this during the index creation:

PUT authors_with_field_level_analyzers
{
"mappings": {
"properties": {
"name":{
"type": "text" #A Standard analyzer is being used here
},
"about":{
"type": "text",
"analyzer": "english" #B Set explicitly with an english analyzer
},
"description":{
"type": "text",
"fields": {
"my":{
"type": "text",
"analyzer": "fingerprint" #C Fingerprint analyzer on a multi-field
}
}
}
}
}
}

As the code shows, the about and description fields were specified with different analyzers except the name field which is implicitly inheriting the standard analyzer.

Index level analyzer

We can also set a default analyzer of our choice at the index level, the following code listing demonstrates this:

Listing : Creating an index with a default analyzer

PUT authors_with_default_analyzer
{
"settings": {
"analysis": {
"analyzer": {
"default":{ #A Setting this property sets index’s default analyzer
"type":"keyword"
}
}
}
}
}

In this code listing, we are in effect replacing the standard analyzer which comes as default to a keyword analyzer. You can test the analyzer by invoking the _analyse endpoint on the index as the code listing given below shows:

PUT authors_with_default_analyzer/_analyze
{
"name":"John Doe"
}

The first code snippet will output a single token “John Doe” with no lowercasing or tokenizing — which indicates it’s been analyzed by our keyword analyzer. You can try the same code using a standard analyzer and you’ll notice the difference.

Setting of analyzers at an index level or field level works during the indexing process. We can, however, use a different analyzer during the querying process — let’s see why and how in the next section.

Analyzers for searching

Elasticsearch lets us specify a different analyzer during query time than using the same one during indexing. It also allows us to set a default analyzer across the index — this can be set during the index creation. Let’s see these two methods in this section as well as some rules that Elasticsearch follows when picking up an analyzer defined at various levels.

Analyzer in a query

We didn’t run through the search part yet so don’t worry if the following code baffles you a bit:

GET authors_index_for_search_analyzer/_search
{
"query": {
"match": { #A
"author_name": {
"query": "M Konda",
"analyzer": "simple" #B
}
}
}
}

As shown in the code above, we are specifying the analyzer explicitly (most likely the author_name field would’ve been indexed using a different type of analyzer!) while searching for an author.

Setting the analyzer at a field level

The second mechanism to set the search specific analyzer is at the field level. Just as we set an analyzer on a field for indexing purposes, we can add an additional property called the search_analzyer on a field to specify the search analyzer. The code below demonstrates this method:

PUT authors_index_with_both_analyzers_field_level
{
"mappings": {
"properties": {
"author_name":{
"type": "text",
"analyzer": "stop",
"search_analyzer": "simple"
}
}
}
}

As the code above shows, the author_name is set with a stop analyzer for indexing while a simple analyzer for search time.

Default analyzer at index level

We can also set a default analyzer for search queries too just as we did for indexing time by setting the required analyzer on the index at index creation time. The following code listing demonstrates the setting:

PUT authors_index_with_default_analyzer
{
"settings": {
"analysis": {
"analyzer": {
"default_search":{ #A
"type":"simple"
},
"default":{ #B
"type":"standard"
}
}
}
}
}

In the above code listing, we also have set the default analyzer for indexing too in addition to the search at the same time. You may be wondering if we can set a search analyzer at a field level during the indexing rather than at runtime during the query? The code below demonstrates exactly that — setting different analyzers for indexing and searching at a field level during the creation of an index:

PUT authors_index_with_both_analyzers_field_level
{
"mappings": {
"properties": {
"author_name":{
"type": "text",
"analyzer": "standard",
"search_analyzer": "simple"
}
}
}
}

As you can see from the above code, the author_name is going to use a standard analyzer for indexing while a simple analyzer during search.

Order of precedence

There’s a precedence of order in which the analyzer is picked up by the engine when it finds the analyzer at various levels. The following is the order of precedence in which the engine picks up the right analyzer:

  • An analyzer defined at a query level has the highest precedence.
  • An analyzer defined by setting search_analyzer property on a field when defining the index mappings.
  • An analyzer defined at the index level.
  • If neither of the above were not set, the Elasticsearch engine picks up the indexing analyzer set on a field or an index.

--

--

Madhusudhan Konda
Madhusudhan Konda

Written by Madhusudhan Konda

Madhusudhan Konda is a full-stack lead engineer, mentor, and conference speaker. He delivers live online training on Elasticsearch, Elastic Stack &Spring Cloud

No responses yet