Elasticsearch in Action: Metric Aggregations 2/2

Madhusudhan Konda
5 min readJan 30, 2023

Excerpts taken from my upcoming book: Elasticsearch in Action

The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.

Me @ Medium || LinkedIn || Twitter || GitHub

In the last article, we learned a bit about metric aggregations. In this article, we continue on other metric aggregations.

There will be times where we need to find the minimum and maximum quantities from a set of values, say the minimum number of available speakers for a conference, or the session with the highest number of attendees. Elasticsearch exposes the corresponding metrics in the form of min and max for producing these extremes of a data set. These metrics are self explanatory, but in the interest of completeness, let’s go over them briefly.

Minimum metric

Let’s say we want to find the cheapest priced TV in our stock. This clearly is a candidate for employing minimum metric on the data values. The following listing defines the min metric on the price_gbp field.

GET tv_sales/_search
{
"size": 0,
"aggs": {
"cheapest_tv_price": {
"min": { #A
"field": "price_gbp" #B
}
}
}
}

The min keyword fetches this metric, which works on the price_gbp field to produce the expected result: the field’s minimum value derived from all the documents. Here we are fetching the lowest priced TV (£999) present in our stock by executing the query.

Maximum metric

You can use similar logic to fetch the best selling TV: a TV with a maximum number of sales. The following query fetches the TV with the maximum number of sales (the best selling TV).

GET tv_sales/_search
{
"size": 0,
"aggs": {
"best_seller_tv_by_sales": {
"max": {
"field": "sales"
}
}
}
}

Once we execute the query, we should receive a TV that sells super fast (maximum sales). From the results, it seems to be LG’s 8K TV with 48 sales.

The common stats metric

While the previous metrics are single-valued (meaning they work only on a single field), the stats metric fetches all common statistical functions. It is a multi-value aggregation that fetches a few metrics (avg, min, max, count, and sum) all in one go. The query in the next listing applies the stats aggregations for the price_gbp field.

GET tv_sales/_search
{
"size": 0,
"aggs": {
"common_stats":{
"stats": {
"field": "price_gbp"
}
}
}
}

As the query demonstrates, we use the stats function on the price_gbp field to fetch the common statistics. Once executed, this query returns the following results:

"aggregations" : {
"common_stats" : {
"count" : 6,
"min" : 999.0,
"max" : 1800.0,
"avg" : 1299.6666666666667,
"sum" : 7798.0
}
}

As you can see, the stats metric returns all the other five metrics in one go. This makes it a handy metric if you want to see the basic aggregations all in one place.

The extended stats metric

Although stats is a useful common metric, it doesn’t provide us with advanced statistical analytics such as variance, standard deviation, and other statistical functions. Elasticsearch provides another metric called extended_stats out of the box, which is the cousin of stats by dealing with these advanced statistical metrics.

The extended_stats metric provides three additional stats in addition to the standard statistical metrics: the sum_of_squares, variance, and standard_deviation metrics. The following listing illustrates how we can extract various variance flavors and standard deviations using the extended_stats metric.

GET tv_sales/_search
{
"size": 0,
"aggs": {
"additional_stats":{
"extended_stats": {
"field": "price_gbp"
}
}
}
}

As the code demonstrates, we invoke the extended_stats function on the price_gbp field. This retrieves a whole lot of statistical data as figure below illustrates.

Figure : The extended statistics on the price_gbp field

The query in the previous listing calculates a lot of advanced statistical information on the price_gbp field. Note that the result also includes the common metrics (avg, min, max, and so on) in addition to the various variances and standard deviations.

The cardinality metric

The cardinality metric returns unique values for the given set of documents. It is a single value metric that fetches occurrences of distinct values from our data. For example, the query in the next listing retrieves the unique TV brands that we have in our index.

GET tv_sales/_search
{
"size": 0,
"aggs": {
"unique_tvs": {
"cardinality": {
"field": "brand.keyword"
}
}
}
}

The query fetches the number of unique brands that we have in our tv_sales index. Because we have four unique brands (Samsung, LG, Phillips, and Panasonic), the result should show us 4 in the unique_tvs aggregation as in the following snippet:

"aggregations" : {
"unique_tvs" : {
"value" : 4
}
}

Because the data is distributed in Elasticsearch, trying to fetch exact counts of cardinality may lead to performance issues. In order to fetch the exact number, the data must be retrieved and loaded into a hashset of some sort in the in-memory cache. And because this is an expensive operation, the cardinality runs as an approximation. Hence, we should not expect exact counts for unique values, but they are pretty close.

In addition to these metric aggregations, there are a few more metric aggregations that Elasticsearch exposes (and they seem to be growing in number as a new product release comes out). As you can imagine, going over all of these aggregations here is not practical. I strongly recommend that you to go over the Elasticsearch documentation to learn those not covered.

Me @ Medium || LinkedIn || Twitter || GitHub

These short articles are condensed excerpts taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository.

Elasticsearch in Action

--

--

Madhusudhan Konda
Madhusudhan Konda

Written by Madhusudhan Konda

Madhusudhan Konda is a full-stack lead engineer, mentor, and conference speaker. He delivers live online training on Elasticsearch, Elastic Stack &Spring Cloud

No responses yet