Elasticsearch in Action: Multi-match (multi_match) Queries

Madhusudhan Konda
6 min readJan 27, 2023
Excerpts taken from my upcoming book: Elasticsearch in Action

The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.

Me @ Medium || LinkedIn || Twitter || GitHub

In the last article, we worked with match_phrase_prefix query. Here in this article, we learn about multi_match query.

The multi-match (multi_match) query, as the name suggests, searches the query across multiple fields. For example, if we want to search for the word Java across the three fields title, synopsis, and tags, then the multi_matchquery is the answer. The following listing shows a query that searches for Java across these three fields.

GET books/_search
{
"_source": false,
"query": {
"multi_match": {
"query": "Java",
"fields": [
"title",
"synopsis",
"tags"
]
}
},
"highlight": {
"fields": {
"title": {},
"tags": {}
}
}
}

The multi_matchquery expects an array of fields along with the search criteria. We get the aggregated results from combining all the results for individual fields.

Best fields

You may be wondering what would be the relevancy of the document when we search multiple fields? Fields where more words are matched are scored higher. That is, if we search for Java Collections across multiple fields, a field (say synopsis) where two words match is more relevant than a field with one (or no) matches. The document with this synopsisfield in this case is set with a higher relevancy score.

Fields that match all the search criteria are called the best fields. In the previous example, assuming synopsisholds both words, Java and Collections, we can simply say that synopsisis the best field. Multi-match uses a best_fields type under the hood when running queries. This type is the default for multi_matchqueries. There are, of course, other types of fields, which we will see shortly.

Let’s rewrite the query we wrote in the previous listing, but this time instead of letting Elasticsearch use the default setting of the best_fieldstype, we specifically mention it by overriding the typefield. The following listing shows the resulting query.

GET books/_search
{
"_source": false,
"query": {
"multi_match": {
"query": "Design Patterns",
"type": "best_fields",
"fields": ["title","synopsis"]
}
},
"highlight": {
"fields": {
"tags": {},
"title": {}
}
}
}

In the listing, we query for Design Patterns across the titleand synopsisfields. This time, we explicitly assign the query to use the best_fieldstype for the multi_matchquery.

Note: The default type for a multi_matchquery is set as best_fields. The best_fieldsalgorithm ranks the field that has most words higher than those with the least amount of words.

If you look at the response and the scores (see the following code snippet), you’ll find the Head First Design Patterns book has a score of 6.9938974 compared to the Head First Object-Oriented Analysis Design book, which has a score of 2.9220228:

"hits" : [{
"_index" : "books",
"_id" : "10",
"_score" : 6.9938974,
"highlight" : {
"title" : [
"Head First <em>Design</em> <em>Patterns</em>"
]
}
},
{
"_index" : "books",
"_id" : "8",
"_score" : 2.9220228,
"highlight" : {
"title" : [
"Head First Object-Oriented Analysis <em>Design</em>"
]
}
}
...]

There are other types of multi-match queries too, such as cross_fields, most_fields, phrase, phrase_prefix, and others. We can use the type parameter to set the type of query to search for the best match among multiple fields. We won’t delve into all these types, however, so consult Elasticsearch’s documentation for more information.

If you are wondering how Elasticsearch carries out the multi_match query, behind the scenes, it is rewritten using a disjunction max (dis_max) query. We discuss this query type in detail in the following section.

Disjunction max (dis_max) queries

In the previous section, we looked at the multi_matchquery, where the criteria was searched against multiple fields. How does this query type get executed behind the scenes? Elasticsearch rewrites the multi_matchquery using a disjunction max query (dis_max). The dis_maxquery splits each field into a separate match query as the following listing shows.

GET books/_search
{
"_source": false,
"query": {
"dis_max": {
"queries": [
{"match": {"title": "Design Patterns"}},
{"match": {"synopsis": "Design Patterns"}}]
}
}
}

As you can see from this listing, multiple fields are split into two matchqueries under the dis_maxquery. The query returns the documents with a high relevancy _score for the individual field.

Note : The dis_max query is classified as a compound query: a query that wraps up other queries.

In some situations, there is a chance that the relevancy scores of the multi-fields during the multi_matchquery could be the same. In that case, the scores end up in a tie. To break the tie, we use a tie breaker, discussed in the next section.

Tie breakers

The relevancy score is based on the single field’s score, but if the scores are tied, we can specify tie_breakerto relieve the tie. If we use tie_breaker, Elasticsearch calculates the overall score slightly differently which we will see in action shortly, but first, let’s checkout an example.

The following listing queries a couple of words against a couple fields: title and tags. However, the listing also shows an additional parameter, tie_breaker.

GET books/_search
{
"query": {
"multi_match": {
"query": "Design Patterns",
"type": "best_fields",
"fields": ["title","tags"],
"tie_breaker": 0.9
}
}
}

When we search for Design Patterns using the best_fieldstype and specify multiple fields (the fields titleand synopsisin the listing above), we can provide a tie_breakerto overcome any tie on equal scores. When we provide the tie breaker, the overall scoring is calculated as:

Overall score = _score of the best match field + _score of the other matching fields * tie_breaker

A few moments ago, we worked with a dis_maxquery. In fact, Elasticsearch converts all multi_match queries to the dis_maxquery. For example, the multi_matchquery from the listing above can be rewritten as a dis_maxquery as the next listing demonstrates.

GET books/_search
{
"_source": false,
"query": {
"dis_max": {
"queries": [
{"match": {"title": "Design Patterns"}},
{"match": {"synopsis": "Design Patterns"}}],
"tie_breaker": 0.5
}
},
"highlight": {
"fields": {
"title": {},
"synopsis": {},
"tags": {}
}
}
}

As you can see, the same query that was written as a multi_matchquery is now rewritten as dis_maxquery. Indeed, that’s exactly what Elasticsearch does behind the scenes.

Because there are multiple fields that we search, at times we may want to give additional weight to a particular field (for example, finding our search words in a title is more relevant than the same search words appearing in a lengthy synopsisor tagsfields). How do we let Elasticsearch know that our intention is to give extra weight to the titlefield? That’s exactly what we do when boosting individual queries, discussed in the following section.

Individual field boosting

There’s usually a search bar provided to the users on a website or application so that they can search for something such as a product, book, or review and so forth. When the user enters a few words, they don’t say that they are interested in searching only those words in a particular field. For example, when we search for C# book on Amazon, we don’t ask Amazon to search only in a particular category such as title or synopsis. We simply input the string in the text box and let Amazon do the job of figuring out the result. That’s exactly what we can do using individual field boosts!

In a multi_matchquery, we can bump up (boost) the criteria for a specific field. Say, when searching for C# Guide, we decide finding the word in the title is more important than in the tags. In this case, we simply boost the importance of the title field by using a caret and a number: title², for example. The following listing shows the full query for this scenario.

GET books/_search
{
"query": {
"multi_match": {
"query": "C# Guide",
"fields": ["title^2", "tags"]
}
}
}

In this listing, we double up the titlefield’s importance, so if the text C# Guide is found in the titlefield, that document will have a higher score than the document found in the tags field.

Me @ Medium || LinkedIn || Twitter || GitHub

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action

--

--

Madhusudhan Konda
Madhusudhan Konda

Written by Madhusudhan Konda

Madhusudhan Konda is a full-stack lead engineer, mentor, and conference speaker. He delivers live online training on Elasticsearch, Elastic Stack &Spring Cloud

No responses yet