Elasticsearch in Action: Match Phrase (match_phrase) Queries

Madhusudhan Konda
4 min readJan 27, 2023
Excerpts taken from my upcoming book: Elasticsearch in Action

The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.

Me @ Medium || LinkedIn || Twitter || GitHub

In the last article, we looked at match query in detail. We work with match_phrase query in this article.

The match phrase (match_phrase) query finds the documents that match exactly a given phrase. The idea behind the match phrase is to search for the phrase (group of words) in a given field in the same order. For example, if you are looking for the phrase book for every Java programmer in the synopsis of a book, documents are searched with those words in that order.

Words can be split individually and searched with an AND/OR operator when using a matchquery. The match_phrasequery is the opposite. It returns the results matching the search phrase exactly. The following listing illustrates the match_phrasequery in action.

GET books/_search
{
"query": {
"match_phrase": {
"synopsis": "book for every Java programmer"
}
}
}

The match_phrasequery expects a phrase as you can see in the code in the previous listing. It returns exactly one document because we only have one in our books index with that phrase in the synopsis field.

Match phrase with the keyword slop

What if we drop a word or two in between the said phrase? Say, for example, we remove the for or every (or both) from the phrase book for every Java programmerand rerun the same query. Unfortunately, the query wouldn’t return any results! The reason for this is that match_phraseexpects the words in a phrase to match the exact phrase, word by word. Searching “book Java programmer” returns no results. Fortunately, there is a fix to this problem: using a parameter called slop.

The slopparameter allows us to ignore the number of words in between the words in that phrase. We can drop the in-between words in the phrase. However, we need to let the engine know how many words to drop. This is done by setting a value for the slopparameter. The attribute slopis an integer value indicating the number of words that can be ignored in a phrase when searching match_phrase. For example, slopwith 1 ignores one word, slopwith 2 forgives two words missing in a phrase, and so on. The default value of slop is 0, meaning we will not be forgiven for providing a phrase with missing words.

Coming back to our example, let’s drop a word from the given phrase, so instead of a “book for every Java programmer,” we’ll search for the phrase “every Java programmer,” dropping the word for. Because we drop a single world, we need to set the slopparameter to 1 (the missing word is just one word). The query in the next listing demonstrates this. Obviously, we need to expand the query by providing two further parameters in the queryand slopobjects for the synopsis field.

GET books/_search
{
"query": {
"match_phrase": {
"synopsis": {
"query": "book every Java programmer",
"slop": 1
}
}
}
}

If you want to use the slopparameter, both query and slopmust be provided along with the field’s object as demonstrated in the previous listing (the long form of the query). Because slopis set to 1, the query matches if one word is missing in an entire phrase in the synopsis field.

Without a doubt, this query returns the book matching our entire phrase. The takeaway from this example is that a match phrase query looks for an exact phrase, but if you are not sure of the exact phrase, you can use the slopparameter to indicate how forgiving your query should be.

There’s a slight variation to the match phrase query — the match phrase prefix (match_phrase_prefix) query. In addition to matching an exact phrase, we can expect the last word to be matched as a prefix.

We look at match_phrase_prefix query in the next article.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action

--

--

Madhusudhan Konda

Madhusudhan Konda is a full-stack lead engineer, mentor, and conference speaker. He delivers live online training on Elasticsearch, Elastic Stack &Spring Cloud