These are the two “advanced” aggregation topics that show up in senior interviews — pipeline aggs (computing on top of other aggs) and nested aggs (handling the nested field type).
Pipeline aggregations — “aggs of aggs”
In simple language — pipeline aggs don’t look at documents. They look at the output of other aggs and compute new values from it. Think SQL window functions.
Common pipeline aggs — avg_bucket, sum_bucket, max_bucket, min_bucket, derivative, moving_avg (deprecated → moving_fn), cumulative_sum, bucket_selector.
Example — daily revenue + moving average
GET /orders/_search
{
"size": 0,
"aggs": {
"daily": {
"date_histogram": {
"field": "created_at",
"calendar_interval": "day"
},
"aggs": {
"revenue": {
"sum": { "field": "amount" }
},
"7d_moving_avg": {
"moving_fn": {
"buckets_path": "revenue",
"window": 7,
"script": "MovingFunctions.unweightedAvg(values)"
}
}
}
}
}
}
Two nested levels here:
daily— date_histogram, one bucket per day.- Inside each day —
revenue(a metric) +7d_moving_avg(a pipeline agg that looks at the last 7revenuevalues).
The magic is buckets_path: "revenue" — that’s how pipeline aggs reference other aggs.
Cumulative sum — running totals
{
"aggs": {
"daily": {
"date_histogram": { "field": "created_at", "calendar_interval": "day" },
"aggs": {
"revenue": { "sum": { "field": "amount" } },
"cumulative": {
"cumulative_sum": { "buckets_path": "revenue" }
}
}
}
}
}
Each day’s cumulative bucket = sum of all previous days’ revenue. Classic for “total sales to date” charts.
Stats across buckets — avg_bucket, max_bucket
“What’s the average daily revenue?” — that’s avg_bucket:
{
"aggs": {
"daily": {
"date_histogram": { "field": "created_at", "calendar_interval": "day" },
"aggs": {
"revenue": { "sum": { "field": "amount" } }
}
},
"avg_daily_revenue": {
"avg_bucket": { "buckets_path": "daily>revenue" }
}
}
}
Notice daily>revenue — the > is “into the sub-agg”. This is sibling position — avg_daily_revenue is a sibling of daily, not a child.
bucket_selector — filtering buckets
Drop buckets that don’t meet a condition.
{
"aggs": {
"by_category": {
"terms": { "field": "category.keyword", "size": 50 },
"aggs": {
"revenue": { "sum": { "field": "amount" } },
"min_revenue_filter": {
"bucket_selector": {
"buckets_path": { "rev": "revenue" },
"script": "params.rev > 10000"
}
}
}
}
}
}
This keeps only categories with > $10k revenue. The bucket_selector doesn’t compute a value — it just decides whether each bucket is kept or dropped.
Nested aggregations — the nested field type
Different concept entirely. ES flattens arrays of objects by default, which causes a famous bug — fields in the same object lose their relationship.
Say we index:
{
"name": "MacBook Pro",
"reviews": [
{ "author": "alice", "rating": 5 },
{ "author": "bob", "rating": 1 }
]
}
Without nested mapping, ES stores this as reviews.author: ["alice", "bob"] and reviews.rating: [5, 1]. A query like “author is alice AND rating is 1” would match this doc (because alice exists AND rating=1 exists), even though they’re from different objects.
The fix — declare reviews as nested in the mapping. Then each review object is indexed as a hidden child document, preserving relationships.
PUT /products
{
"mappings": {
"properties": {
"reviews": {
"type": "nested",
"properties": {
"author": { "type": "keyword" },
"rating": { "type": "integer" }
}
}
}
}
}
Aggregating nested fields
To aggregate over nested objects, we need the nested aggregation as an intermediate step:
GET /products/_search
{
"size": 0,
"aggs": {
"reviews_agg": {
"nested": { "path": "reviews" },
"aggs": {
"avg_rating": {
"avg": { "field": "reviews.rating" }
},
"by_author": {
"terms": { "field": "reviews.author" }
}
}
}
}
}
The nested agg “enters” the nested context. Now reviews.rating and reviews.author work correctly — relationships preserved.
reverse_nested — climbing back up
What if inside a nested agg we want to count parent docs, not child reviews?
{
"aggs": {
"reviews_agg": {
"nested": { "path": "reviews" },
"aggs": {
"by_rating": {
"terms": { "field": "reviews.rating" },
"aggs": {
"parent_products": {
"reverse_nested": {}
}
}
}
}
}
}
}
For each rating bucket, reverse_nested tells us how many products (parent docs) contain a review at that rating — not how many review objects.
Quick rules
- Pipeline aggs reference others via
buckets_path—namefor direct,name>subfor nested. moving_fn,cumulative_sum,derivativeare parent pipelines (live inside the bucket).avg_bucket,max_bucket, etc. are sibling pipelines (next to the bucket).bucket_selectorfilters buckets post-aggregation — like SQLHAVING.- For arrays of objects with field relationships that matter — use
nestedmapping +nestedagg. reverse_nestedlets us count parent docs from inside a nested agg.