Phrase queries with query_string query
If you are wondering if there’s support for a phrase search using query_string
, indeed there is. The only thing we need to take note of is that phrases must be enclosed in quotes. That means, the quotes that correspond to the phrase must be escaped; for example, “query”: “\”Design Patterns\””
. The query in the next listing searches for a phrase.
GET books/_search
{
"query": {
"query_string": {
"query": "\"making the code better\"",
"default_field": "synopsis"
}
}
}
As you can expect, this code searches for the phrase “making the code better”
in the synopsis
field and fetches the Effective Java book.
Going with the flow, we can use the slop
parameter if we are missing one or two words in the phrase.
For example, the code in the following listing demonstrates how the phrase_slop
parameter allows for a missing word in the phrase (the is dropped from the phrase) and still gets a successful result.
GET books/_search
{
"query": {
"query_string": {
"query": "\"making code better\"",
"default_field": "synopsis",
"phrase_slop": 1
}
}
}
The query misses a word, but the phrased_slop
setting forgives the omission and, hence, we get the desired result.
Fuzzy queries with query_string query
We can also ask Elasticsearch to forgive spelling mistakes by using fuzzy queries with query_string
queries. All we need to do is suffix the query criteria with a tilde (~) operator.
This is best understood by an example as the following listing demonstrates.
GET books/_search
{
"query": {
"query_string": {
"query": "Pattenrs~",
"default_field": "title"
}
}
}
By setting the suffix with the ~ operator, we are requesting the engine to consider the query as a fuzzy query. By default, the edit distance of 2 is used when working with fuzzy queries.
The edit distance is the number of mutations required to transform a string to another string. For example, “CAT” requires an edit distance of 1 to transform it into “CAP”.
The queries use the Levenshtein distance algorithm to support fuzzy queries. However, there’s another type of edit distance algorithm called the Damerau–Levenshtein distance algorithm. In fact, the Damerau–Levenshtein distance is used to support the fuzzy queries. It supports insertions, deletions, or substitution of a maximum of two characters as well as transposition of adjacent characters.
> The Levenshtein distance algorithm defines the minimal number of mutations that are required on a string to be transformed into another string. These mutations include insertions, deletions, and substitutions. The Damerau–Levenshtein distance algorithm goes one step further. In addition to having all the mutations as defined by Levenshtein, the Damerau-Levenshtein algorithm considers the transposition of adjacent characters (for example, TB-> BT -> BAT).
By default, the edit distance in a query_string
query is 2, but we can reduce it if needed by setting the 1 after the tilde like so: ”Patterns~1”
.
These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.