Query String Queries Supporting Phrase and Fuzzy functions

Madhusudhan Konda
3 min readJan 29, 2023
Excerpts taken from my upcoming book: Elasticsearch in Action

Me @ Medium || LinkedIn || Twitter || GitHub

Phrase queries with query_string query

If you are wondering if there’s support for a phrase search using query_string, indeed there is. The only thing we need to take note of is that phrases must be enclosed in quotes. That means, the quotes that correspond to the phrase must be escaped; for example, “query”: “\”Design Patterns\””. The query in the next listing searches for a phrase.

GET books/_search
{
"query": {
"query_string": {
"query": "\"making the code better\"",
"default_field": "synopsis"
}
}
}

As you can expect, this code searches for the phrase “making the code better” in the synopsis field and fetches the Effective Java book.

Going with the flow, we can use the slop parameter if we are missing one or two words in the phrase.

For example, the code in the following listing demonstrates how the phrase_slop parameter allows for a missing word in the phrase (the is dropped from the phrase) and still gets a successful result.

GET books/_search
{
"query": {
"query_string": {
"query": "\"making code better\"",
"default_field": "synopsis",
"phrase_slop": 1
}
}
}

The query misses a word, but the phrased_slopsetting forgives the omission and, hence, we get the desired result.

Fuzzy queries with query_string query

We can also ask Elasticsearch to forgive spelling mistakes by using fuzzy queries with query_stringqueries. All we need to do is suffix the query criteria with a tilde (~) operator.

This is best understood by an example as the following listing demonstrates.

GET books/_search
{
"query": {
"query_string": {
"query": "Pattenrs~",
"default_field": "title"
}
}
}

By setting the suffix with the ~ operator, we are requesting the engine to consider the query as a fuzzy query. By default, the edit distance of 2 is used when working with fuzzy queries.

The edit distance is the number of mutations required to transform a string to another string. For example, “CAT” requires an edit distance of 1 to transform it into “CAP”.

The queries use the Levenshtein distance algorithm to support fuzzy queries. However, there’s another type of edit distance algorithm called the Damerau–Levenshtein distance algorithm. In fact, the Damerau–Levenshtein distance is used to support the fuzzy queries. It supports insertions, deletions, or substitution of a maximum of two characters as well as transposition of adjacent characters.

> The Levenshtein distance algorithm defines the minimal number of mutations that are required on a string to be transformed into another string. These mutations include insertions, deletions, and substitutions. The Damerau–Levenshtein distance algorithm goes one step further. In addition to having all the mutations as defined by Levenshtein, the Damerau-Levenshtein algorithm considers the transposition of adjacent characters (for example, TB-> BT -> BAT).

By default, the edit distance in a query_stringquery is 2, but we can reduce it if needed by setting the 1 after the tilde like so: ”Patterns~1”.

Me @ Medium || LinkedIn || Twitter || GitHub

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action

--

--

Madhusudhan Konda
Madhusudhan Konda

Written by Madhusudhan Konda

Madhusudhan Konda is a full-stack lead engineer, mentor, and conference speaker. He delivers live online training on Elasticsearch, Elastic Stack &Spring Cloud

No responses yet