Aggregations are how ES does analytics. They come in three flavors — bucket (group docs), metric (compute numbers), and pipeline (operate on other aggs). This note covers bucket aggs.
In simple language — bucket aggs are like SQL’s GROUP BY. They split docs into groups based on some criterion, and we can then run metrics on each group.
Terms aggregation — the workhorse
Group docs by the unique values of a field. Like GROUP BY category.
GET /products/_search
{
"size": 0,
"aggs": {
"by_category": {
"terms": {
"field": "category.keyword",
"size": 10
}
}
}
}
size: 0 at the top means “don’t return docs, just aggregations” — saves bandwidth. The agg result looks like:
{
"aggregations": {
"by_category": {
"buckets": [
{ "key": "laptops", "doc_count": 142 },
{ "key": "phones", "doc_count": 98 },
{ "key": "tablets", "doc_count": 47 }
]
}
}
}
The size + accuracy gotcha
size: 10 returns top 10 buckets. But ES is distributed — each shard returns its top-10, then results merge. This means the global top-10 might be slightly off for skewed data.
To improve accuracy at a cost, bump shard_size:
{
"terms": {
"field": "category.keyword",
"size": 10,
"shard_size": 100
}
}
Each shard returns top 100, we keep top 10. Trade more network/CPU for accuracy.
Date histogram — time-series bucketing
Group docs by time buckets. The bread and butter of dashboards.
GET /orders/_search
{
"size": 0,
"aggs": {
"orders_per_day": {
"date_histogram": {
"field": "created_at",
"calendar_interval": "day"
}
}
}
}
Calendar intervals — minute, hour, day, week, month, quarter, year. These respect calendar boundaries (e.g., months have variable lengths).
For fixed intervals (always the same number of milliseconds), use fixed_interval:
{ "date_histogram": { "field": "created_at", "fixed_interval": "30m" } }
Use fixed_interval for sub-day buckets (15m, 30m, 1h), calendar_interval for day/week/month.
Range aggregation — custom numeric buckets
GET /products/_search
{
"size": 0,
"aggs": {
"price_brackets": {
"range": {
"field": "price",
"ranges": [
{ "to": 100 },
{ "from": 100, "to": 500 },
{ "from": 500, "to": 1000 },
{ "from": 1000 }
]
}
}
}
}
Result groups: < $100, $100-500, $500-1000, > $1000. Perfect for e-commerce price filters.
Filters aggregation — named arbitrary buckets
When buckets don’t follow a single rule, define them as named filters:
GET /logs/_search
{
"size": 0,
"aggs": {
"by_status": {
"filters": {
"filters": {
"errors": { "range": { "status_code": { "gte": 500 } } },
"warnings": { "range": { "status_code": { "gte": 400, "lt": 500 } } },
"success": { "range": { "status_code": { "gte": 200, "lt": 300 } } }
}
}
}
}
}
This gives us 3 named buckets — errors, warnings, success — each defined by its own filter. More flexible than range/terms when buckets cross fields.
Histogram — numeric bucketing
Like date_histogram but for numbers. Useful for distribution charts.
{
"aggs": {
"rating_distribution": {
"histogram": {
"field": "rating",
"interval": 1
}
}
}
}
Buckets at intervals of 1 — 1.0, 2.0, 3.0, 4.0, 5.0. Plot it as a bar chart and we have a star-rating histogram.
Visualizing the structure
Combining with queries
Aggs run on the query result set. So:
GET /orders/_search
{
"size": 0,
"query": {
"range": { "created_at": { "gte": "now-30d/d" } }
},
"aggs": {
"orders_per_day": {
"date_histogram": { "field": "created_at", "calendar_interval": "day" }
}
}
}
This gives us “orders per day, last 30 days”. The query filters first, the agg buckets the survivors.
Quick rules
size: 0saves bandwidth when we only want aggs.termson atextfield requires.keywordsubfield (orfielddata: true, which is memory-heavy).- Top-N from
termsis approximate across shards. Increaseshard_sizeif accuracy matters. date_histogramfor time,histogramfor numbers,rangefor custom numeric brackets,filtersfor arbitrary named buckets.- Aggs operate on the queried subset — combine query + agg for “stats about my filtered data”.