Understanding and Working with Match Queries

Madhusudhan Konda
7 min readJan 27, 2023
Excerpts taken from my upcoming book: Elasticsearch in Action

The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.

Me @ Medium || LinkedIn || Twitter || GitHub

The matchquery is the most common and powerful query for multiple use cases. It is a full text search query returning the documents that match the specified criteria. The matchquery can be improvised for querying a multitude of options.

Format of a match query

Let’s first look at the format of the matchquery as this snippet shows:

GET books/_search
{
"query": {
"match": {
"FIELD": "SEARCH TEXT"
}
}
}

As you can see in the snippet, the matchquery expects the search criteria to be defined in the form of a field value. The field can be any of the text fields present in a document, whose values are to be matched. The value can be a word or multiple words, given either as uppercase, lowercase, or camel case.

There are a handful of additional parameters in the query’s full form that we can pass to the matchquery too. The one we have discussed so far is a shortened form of the matchquery. The following code snippet provides an example of the full form:

GET books/_search
{
"query": {
"match": {
"FIELD": {
"query":"<SEARCH TEXT>",
"<parameter>":"<MY_PARAM>",
}
}
}
}

We can search across multiple indices by providing comma-separated indices in the search URL as the following demonstrates:

GET new_books,classics,top_sellers, crime* /_search
{
...
}

As you can see, any number of indices can be provided when invoking the _search endpoint, including wildcards.

Note : If we omit the index (or indices) in the search request, we effectively search the entire index. For example, GET _search{ ... } searches across all the indices in the cluster.

Searching using a match query

Now that we know the format for a matchquery, let’s look at an example where we want to search for Java books with Java in the titlefield. The following listing demonstrates this, setting the titlefield to the word Java as the text to search.

GET books/_search
{
"query": {
"match": {
"title": "Java"
}
}
}

As the listing shows, we are creating a matchquery with a search criteria of searching a word in a titlefield. Elasticsearch fetches all the documents that match the word Java in the titlefield as expected.

Match query analysis

Term-level queries are not analyzed. The match queries that work on text fields, on the other hand, are analyzed. The same analyzers used during the indexing process (unless search queries were explicitly defined with different analyzers) process the search words in match queries. If a standard analyzer (default analyzer) is used during the indexing of our document, the search words are analyzed using the same standard analyzer before the search is executed.

Additionally, the standard analyzer applies the same lowercase token filter (remember, the lowercase token filter is applied during the indexing) to the search words. Thus, if you provide the search keywords as uppercased, they are converted to lowercase letters and searched against the inverted index. For example, if we change the titlevalue to use uppercase criteria such as "title”: “JAVA”, for example, and rerun the query, the results are the same as the search query in listing 10.4. If you change the title value to lowercase or in any other way (e.g., java, jaVA, etc.), the query still returns the same results.

Searching multiple words

In our earlier code example (previous listing), we used a single word (Java) as the search criteria against a titlefield. We can expand this criteria to accommodate searching for multiple words or a sentence in a single field. For example, we could search for the words Java Complete Guide in the titlefield or for Concurrency and Multithreading in the synopsisfield, and so on. Indeed, searching a string of words (like a broken sentence) is more common than searching for a single word. The query in the following listing does exactly that.

GET books/_search
{
"query": {
"match": {
"title": {
"query": "Java Complete Guide"
}
}
},
"highlight": {
"fields": {
"title": {}
}
}
}

Here, our intention is to search for a specific title (Java Complete Guide). That is, we want to fetch a book titled Java Complete Guide, if available, and return nothing if not. However, if we execute the query with these words, you may be surprised to see more documents than just the one that matches exactly with the search query.

The reason for this behavior is that Elasticsearch employs an OR Boolean operator by default for this query, so it fetches all the documents that match with any of the words. The words are matched individually rather than as a phrase: in our example, Elasticsearch searches for Java and returns the relevant documents, followed by another search for Completeand adds the results to the list, and so on. The query returns either Java, Complete, or Guide, including combinations of the words as its results.

By default, Elasticsearch uses an OR operator when searching for a set of worlds (as you may have already guessed). The same search in listing given above can be rewritten (though the ORoperator is redundant) as the next listing shows.

GET books/_search
{
"query": {
"match": {
"title": {
"query": "Java Complete Guide",
"operator": "OR"
}
}
}
}

If you want to change this behavior to find the documents that have all three words in the title, then you need to enable the ANDoperator. The following listing shows this approach.

GET books/_search
{
"query": {
"match": {
"title": {
"query": "Java Complete Guide",
"operator": "AND"
}
}
}
}

This query tries to find a book or all books that match all three words (the title must have Java AND Complete AND Guide). However, in our data set, we do not have a book called Java Complete Guide, so no results are returned.

Matching at least a few words

The OR and ANDoperators are opposing conditions. The OR condition fetches either of the search words, and the AND condition gets matching documents exactly for all of the words. What if we want to find documents that match at least a few words from the given set of words? In the previous example, suppose we want at least two words out of three to match (say, Java and Guide, for example). This is where the minimum_should_matchattribute comes in handy.

The minimum_should_matchattribute indicates the minimum number of words that should be used to match the documents. The next listing demonstrates this in action.

GET books/_search
{
"query": {
"match": {
"title": {
"query": "Java Complete Guide",
"operator": "OR",
"minimum_should_match": 2
}
}
}
}

This query will match at least two words (the minimum_should_matchattribute is set to 2) and will fetch the documents with a combination of two words out of the given three words. The OR operator is redundant here because it is applied by default.

Note : Setting the value to 3 in the previous listing is as good as changing the operator to AND: all the words must be matched.

Fixing typos using the keyword fuzziness

When searching for things, we sometimes might incorrectly type the search criteria (we all have been there); for example, instead of searching for Java books, we might post a query with Kava as the search criteria. While we know, the intention is to search Java books, so too does Elasticsearch. It employs a concept called fuzziness. Simply put, fuzziness is a mechanism to correct a user’s spelling mistakes in query criteria.

Fuzziness makes character changes to string input so that it is the same as the string that may exist in the index. It employs the Levenshtein distance algorithm to fix incorrect spellings.

A match query also allows us to add a fuzzinessparameter to fix spelling mistakes. We can set it as a numeric value, where the expected values are 0, 1, or 2, meaning none, one, or two character changes (insertions, deletions, modifications), respectively. In addition to setting these values, we also use an AUTO setting; we let the engine deal with the changes by setting AUTOas its fuzziness parameter. The following listing shows how we use the fuzziness (with a value of 1) to sort our Kava spelling typo.

GET books/_search
{
"query": {
"match": {
"title": {
"query": "Kava",
"fuzziness": 1
}
}
}
}

Again, we will cover fuzziness and fuzzy queries in the later part of this chapter, so do not fret if you are overwhelmed by this instance. In the example, when searching the text string, Java Complete Guide, we used a set of words to search for a book (or books), and most likely, the words were expected to be treated individually (like a set of search words). However, at times we may want to search for a phrase or a sentence. That’s when the match_phrase query comes into the picture.

We look at match_phrase query in the next article.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action

--

--

Madhusudhan Konda
Madhusudhan Konda

Written by Madhusudhan Konda

Madhusudhan Konda is a full-stack lead engineer, mentor, and conference speaker. He delivers live online training on Elasticsearch, Elastic Stack &Spring Cloud