Paginating term aggregation

09/25/2020

Advanced Usage, Simple Elastic Usage, Spoon's Elastic posts

In Elasticsearch, paginating aggregations results is a recurring need.
By default, Elastic will send all results in your aggregation. If a query filter is often enough, it’s not always the wanted behavior.

First possibility, increase a lot the size parameter and do the pagination on front side.
It can be a good solution… for few hundred results, and a low cardinality.

But if we don’t want to crash our app, we probably can do better.

Depending on your specific use case you will have several choices:

Bucket Sort aggregation
Partitions
Composite Aggregations

Bucket Sort aggregation

ElasticSearch supports Bucket Sort Aggregation in v6.1 and later. It allows « sort », « size » and « from » parameters within aggregated results.

Exemple from official doc.

POST /sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "total_sales": {
          "sum": {
            "field": "price"
          }
        },
        "sales_bucket_sort": {
          "bucket_sort": {
            "sort": [
              { "total_sales": { "order": "desc" } } 
            ],
            "size": 3                                
          }
        }
      }
    }
  }
}

But there is not a big performance difference because this is a pipeline aggregation. So it’s applicated on the results of other the targeted bucket.
Caclulated on results of the other aggregation. So you will have to put a large size on the first one. (so calculate this large size)

Easiness: 5
Performance: 2.5
Capabilities: 1

Ok for low to medium cardinality, but need to compute everything. So no real improvement.

Partitions aggregations

Partitioning an aggregation is more interesting. It really divide an aggregation

Use the cardinality aggregation to estimate the total number of unique result values
Pick a value for num_partitions to break the number from 1) up into more manageable chunks
Pick a size value for the number of responses we want from each partition
Run a test request

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions

GET /_search
{
   "size": 0,
   "aggs": {
      "expired_sessions": {
         "terms": {
            "field": "account_id",
            "include": {
               "partition": 0,
               "num_partitions": 20
            },
            "size": 10000,
            "order": {
               "last_access": "asc"
            }
         },
         "aggs": {
            "last_access": {
               "max": {
                  "field": "access_date"
               }
            }
         }
      }
   }
}

Far better. You can have a real server side pagination.
But it’s applicable only on the more simple aggregations.

Easiness: 3
Performance: 4
Capabilities: 3

Composite aggregation

This is the ultimate use case. But it introduces more complexity and can manage only a few aggregations (terms, histograms and geotile_grid).

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html

It’s not a pipeline aggregation but a multi-bucket aggregation. It means that it’s also a real server side pagination.
Basically it works like a search after

GET /_search
{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "composite": {
        "size": 2,
        "sources": [
          { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } },
          { "product": { "terms": { "field": "product", "order": "asc" } } }
        ],
        "after": { "date": 1494288000000, "product": "mad max" } 
      }
    }
  }
}

It adds an after key parameter usable as it is to paginate on several aggregation at once.

      "after_key" : {
        "date" : 1594080000000,
        "product" : "AE"
      }

Easiness: 2
Performance: 4
Capabilities: 4

With recent versions of Elasticsearch, you should be able to manage all your use cases.

Spoon consulting is a certified partner of Elastic

As a certified partner of the Elastic company, Spoon Consulting offers a high level consulting for all kinds of companies.

Read more information on your personal use Elasticsearch use case on Spoon consulting’s posts

Or contact Spoon consulting now.

aggregation elasticsearch paginate