Elasticsearch in Action: Multi-match (multi_match) Queries
The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.
In the last article, we worked with match_phrase_prefix
query. Here in this article, we learn about multi_match
query.
The multi-match (multi_match
) query, as the name suggests, searches the query across multiple fields. For example, if we want to search for the word Java across the three fields title, synopsis,
and tags
, then the multi_match
query is the answer. The following listing shows a query that searches for Java across these three fields.
GET books/_search
{
"_source": false,
"query": {
"multi_match": {
"query": "Java",
"fields": [
"title",
"synopsis",
"tags"
]
}
},
"highlight": {
"fields": {
"title": {},
"tags": {}
}
}
}
The multi_match
query expects an array of fields along with the search criteria. We get the aggregated results from combining all the results for individual fields.
Best fields
You may be wondering what would be the relevancy of the document when we search multiple fields? Fields where more words are matched are scored higher. That is, if we search for Java Collections across multiple fields, a field (say synopsis
) where two words match is more relevant than a field with one (or no) matches. The document with this synopsis
field in this case is set with a higher relevancy score.
Fields that match all the search criteria are called the best fields. In the previous example, assuming synopsis
holds both words, Java and Collections, we can simply say that synopsis
is the best field. Multi-match uses a best_fields
type under the hood when running queries. This type is the default for multi_match
queries. There are, of course, other types of fields, which we will see shortly.
Let’s rewrite the query we wrote in the previous listing, but this time instead of letting Elasticsearch use the default setting of the best_fields
type, we specifically mention it by overriding the type
field. The following listing shows the resulting query.
GET books/_search
{
"_source": false,
"query": {
"multi_match": {
"query": "Design Patterns",
"type": "best_fields",
"fields": ["title","synopsis"]
}
},
"highlight": {
"fields": {
"tags": {},
"title": {}
}
}
}
In the listing, we query for Design Patterns across the title
and synopsis
fields. This time, we explicitly assign the query to use the best_fields
type for the multi_match
query.
Note: The default type for a multi_match
query is set as best_fields
. The best_fields
algorithm ranks the field that has most words higher than those with the least amount of words.
If you look at the response and the scores (see the following code snippet), you’ll find the Head First Design Patterns book has a score of 6.9938974 compared to the Head First Object-Oriented Analysis Design book, which has a score of 2.9220228:
"hits" : [{
"_index" : "books",
"_id" : "10",
"_score" : 6.9938974,
"highlight" : {
"title" : [
"Head First <em>Design</em> <em>Patterns</em>"
]
}
},
{
"_index" : "books",
"_id" : "8",
"_score" : 2.9220228,
"highlight" : {
"title" : [
"Head First Object-Oriented Analysis <em>Design</em>"
]
}
}
...]
There are other types of multi-match queries too, such as cross_fields, most_fields, phrase, phrase_prefix
, and others. We can use the type parameter to set the type of query to search for the best match among multiple fields. We won’t delve into all these types, however, so consult Elasticsearch’s documentation for more information.
If you are wondering how Elasticsearch carries out the multi_match
query, behind the scenes, it is rewritten using a disjunction max (dis_max
) query. We discuss this query type in detail in the following section.
Disjunction max (dis_max) queries
In the previous section, we looked at the multi_match
query, where the criteria was searched against multiple fields. How does this query type get executed behind the scenes? Elasticsearch rewrites the multi_match
query using a disjunction max query (dis_max
). The dis_max
query splits each field into a separate match query as the following listing shows.
GET books/_search
{
"_source": false,
"query": {
"dis_max": {
"queries": [
{"match": {"title": "Design Patterns"}},
{"match": {"synopsis": "Design Patterns"}}]
}
}
}
As you can see from this listing, multiple fields are split into two match
queries under the dis_max
query. The query returns the documents with a high relevancy _score
for the individual field.
Note : The dis_max
query is classified as a compound query: a query that wraps up other queries.
In some situations, there is a chance that the relevancy scores of the multi-fields during the multi_match
query could be the same. In that case, the scores end up in a tie. To break the tie, we use a tie breaker, discussed in the next section.
Tie breakers
The relevancy score is based on the single field’s score, but if the scores are tied, we can specify tie_breaker
to relieve the tie. If we use tie_breaker
, Elasticsearch calculates the overall score slightly differently which we will see in action shortly, but first, let’s checkout an example.
The following listing queries a couple of words against a couple fields: title
and tags. However, the listing also shows an additional parameter, tie_breaker
.
GET books/_search
{
"query": {
"multi_match": {
"query": "Design Patterns",
"type": "best_fields",
"fields": ["title","tags"],
"tie_breaker": 0.9
}
}
}
When we search for Design Patterns using the best_fields
type and specify multiple fields (the fields title
and synopsis
in the listing above), we can provide a tie_breaker
to overcome any tie on equal scores. When we provide the tie breaker, the overall scoring is calculated as:
Overall score = _score of the best match field + _score of the other matching fields * tie_breaker
A few moments ago, we worked with a dis_max
query. In fact, Elasticsearch converts all multi_match queries to the dis_max
query. For example, the multi_match
query from the listing above can be rewritten as a dis_max
query as the next listing demonstrates.
GET books/_search
{
"_source": false,
"query": {
"dis_max": {
"queries": [
{"match": {"title": "Design Patterns"}},
{"match": {"synopsis": "Design Patterns"}}],
"tie_breaker": 0.5
}
},
"highlight": {
"fields": {
"title": {},
"synopsis": {},
"tags": {}
}
}
}
As you can see, the same query that was written as a multi_match
query is now rewritten as dis_max
query. Indeed, that’s exactly what Elasticsearch does behind the scenes.
Because there are multiple fields that we search, at times we may want to give additional weight to a particular field (for example, finding our search words in a title is more relevant than the same search words appearing in a lengthy synopsis
or tags
fields). How do we let Elasticsearch know that our intention is to give extra weight to the title
field? That’s exactly what we do when boosting individual queries, discussed in the following section.
Individual field boosting
There’s usually a search bar provided to the users on a website or application so that they can search for something such as a product, book, or review and so forth. When the user enters a few words, they don’t say that they are interested in searching only those words in a particular field. For example, when we search for C# book on Amazon, we don’t ask Amazon to search only in a particular category such as title or synopsis. We simply input the string in the text box and let Amazon do the job of figuring out the result. That’s exactly what we can do using individual field boosts!
In a multi_match
query, we can bump up (boost) the criteria for a specific field. Say, when searching for C# Guide, we decide finding the word in the title is more important than in the tags. In this case, we simply boost the importance of the title field by using a caret and a number: title²
, for example. The following listing shows the full query for this scenario.
GET books/_search
{
"query": {
"multi_match": {
"query": "C# Guide",
"fields": ["title^2", "tags"]
}
}
}
In this listing, we double up the title
field’s importance, so if the text C# Guide is found in the title
field, that document will have a higher score than the document found in the tags field.