Elasticsearch in Action : Introducing Query DSL

Madhusudhan Konda
4 min readNov 8, 2022
Excerpts taken from my upcoming book: Elasticsearch in Action

I will be presenting these short and condensed articles a mini-series on each of the topics in the next few months. The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.

Elasticsearch developed a search-specific, all-purpose language and syntax we call Query DSL (domain-specific language). Query DSL is a sophisticated, powerful, and expressive language that creates a multitude of queries ranging from basic to complex, in addition to nested and more complicated ones. It can also be extended for analytical queries too. It is a JSON-based query language that can be constructed with queries both for search and analytics. The syntax and format goes this:

GET books/_search 
{
"query": {
"match": {
...
}
}
}

We invoke the _search endpoint with a query object, passed in as the body of the request. The query object consists of the logic for creating the required criteria.

Sample query

Let’s write a multi_match query that searches a keyword, Lord, across two fields, synopsis and title. The query in the following listing demonstrates a search query written in Query DSL format.

Listing : Query DSL sample query

GET movies/_search
{
"query": {
"multi_match": {
"query": "Lord",
"fields": ["synopsis","title"]
}
}
}

GET movies/_search is the shorthand search request from a client to the Elasticsearch server. The full request for this is something like GET http://localhost:9200/movies/_search (my Elasticsearch server is running locally, of course). This request expects a JSON-formatted body that consists of the query.

Query DSL for cURL

The same query can be invoked via cURL. The following listing demonstrates this invocation.

curl -XGET "http://localhost:9200/movies/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"multi_match": {
"query": "Lord",
"fields": ["synopsis","title"]
}
}
}

The query is provided as an argument to the -d parameter as you can see in the code in the listing. Note that the entire query (beginning with Content-Type) is enclosed in a single quote when sending the request via cURL.

Query DSL for aggregations

With Query DSL, we use a similar format for aggregations (analytics) with an aggs (short for aggregations) object instead of a query object. The following listing shows this format.

GET movies/_search
{
"size": 0,
"aggs": {
"average_movie_rating": {
"avg": {
"field": "rating"
}
}
}
}

This query fetches the average rating of all movies by utilizing a metric aggregation called avg (short for average).

Now that we understand the overall form for Query DSL, let’s look a bit more at leaf queries and compound queries.

Leaf and compound queries

Query DSL supports leaf as well as compound queries. The body of the search query can cater to simple or complex query criteria in the form of leaf or compound queries.

We call the queries that are straightforward with no clauses a leaf query. These are the queries that fetch results based on a certain criteria (for example, getting the top-rated movies, movies that are released during a particular year, the gross earnings of a movie, and so on).

With leaf queries, we can find out results for criteria against certain fields. The listing below is an example of a leaf query.

GET movies/_search
{
"query": {
"match_phrase": {
"synopsis": "A meek hobbit from the shire and eight companions"
}
}
}

Leaf queries cannot fetch multiple query clauses. They are not designed to search for “movies that match a title but must NOT match a particular actor AND released during a specific year AND rating must not fall below a certain number”, and so on.

The advanced requirement of logically combining certain clauses to serve a complex query is not possible with a leaf query, which leads to the introduction of compound queries.

Compound queries allow us to create complex queries by combining leaf queries and even other compound queries using logical operators.

A Boolean query, for example, is a popular compound query that supports writing queries with clauses like must, must_not, should, and filter. We can write significantly complex queries using compound queries. The example query in the following listing demonstrates this.

GET movies/_search
{
"query": {
"bool": {
"must": [{"match": {"title": "Godfather"}}],
"must_not": [{"range": {"rating": {"lt": 9.0}}}],
"should": [{"match": {"actors": "Pacino"}}],
"filter": [{"term": {"actors": "Brando"}}]
}
}
}

The compound query in this listing combines a handful of leaf queries joined up by logical operators. It fetches all movies that must match the title Godfather AND must not have a rating less than 9. The query should also consider movies with the actor Pacino. Finally, it filters out everything except for the movies with the actor Brando. Well, that’s a mouthful. If you feel like it is too much to fathom, it is indeed.

Leaf queries (as well as advanced queries) are wrapped in the query object of the search request. Other than implementing the advanced query’s logic (which might at times be too complex), you should see no significant difference when writing compound queries.

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action

--

--

Madhusudhan Konda
Madhusudhan Konda

Written by Madhusudhan Konda

Madhusudhan Konda is a full-stack lead engineer, mentor, and conference speaker. He delivers live online training on Elasticsearch, Elastic Stack &Spring Cloud

No responses yet