If bucket aggs are GROUP BY, metric aggs are everything inside SELECT — SUM, AVG, COUNT(DISTINCT), etc. In simple language, they compute a single number (or a few numbers) from a set of docs.
Single-value metrics
The basic family — give a field, get a number.
GET /orders/_search
{
"size": 0,
"aggs": {
"total_revenue": { "sum": { "field": "amount" } },
"avg_order": { "avg": { "field": "amount" } },
"smallest": { "min": { "field": "amount" } },
"biggest": { "max": { "field": "amount" } },
"order_count": { "value_count": { "field": "amount" } }
}
}
value_count is like SQL’s COUNT(field) — counts non-null values. Note this is not the same as the bucket’s doc_count (which counts documents regardless of field presence).
Response shape:
{
"aggregations": {
"total_revenue": { "value": 145820.50 },
"avg_order": { "value": 234.50 },
"smallest": { "value": 9.99 },
"biggest": { "value": 4999.00 },
"order_count": { "value": 622 }
}
}
stats — all five at once
If we want sum, avg, min, max, count together, there’s a single agg for that:
{
"aggs": {
"order_stats": {
"stats": { "field": "amount" }
}
}
}
Returns all five in one shot. Cheaper than running each individually because ES makes a single pass over the data.
For variance + std deviation too, use extended_stats:
{
"aggs": {
"detailed": {
"extended_stats": { "field": "amount" }
}
}
}
Percentiles — what the average hides
Averages lie. A site with avg response time of 200ms might have 1% of users seeing 5-second timeouts. Percentiles tell the real story.
GET /requests/_search
{
"size": 0,
"aggs": {
"latency_pct": {
"percentiles": {
"field": "response_time_ms",
"percents": [50, 75, 95, 99, 99.9]
}
}
}
}
Response:
{
"latency_pct": {
"values": {
"50.0": 120,
"75.0": 180,
"95.0": 450,
"99.0": 1200,
"99.9": 4800
}
}
}
Read it like — “50% of requests under 120ms, 99% under 1.2s, 99.9% under 4.8s.” That’s the standard latency reporting in any production system.
percentile_ranks — the inverse
If we know the SLO threshold and want to know what % of requests beat it:
{
"aggs": {
"slo": {
"percentile_ranks": {
"field": "response_time_ms",
"values": [500, 1000]
}
}
}
}
Returns “94% of requests under 500ms, 98% under 1000ms”. Use this for SLO dashboards.
Cardinality — approximate distinct count
The ES equivalent of COUNT(DISTINCT field). But here’s the catch — it’s approximate (uses HyperLogLog++).
{
"aggs": {
"unique_users": {
"cardinality": {
"field": "user_id",
"precision_threshold": 3000
}
}
}
}
In simple language — precision_threshold is the upper bound below which the count is essentially exact. Above it, error grows but stays small (~1-2% at most). Default is 3000, max is 40000. Higher precision = more memory.
Why approximate? Exact distinct counts require holding every unique value in memory across shards. That doesn’t scale. HyperLogLog uses a clever probabilistic structure that’s tiny in memory and “close enough” — typical error is well under 1% for the default settings.
| Family | Aggs | Use |
| Basic numeric | sum, avg, min, max | Sales, counts, ranges |
| Combined | stats, extended_stats | One-shot overview |
| Distribution | percentiles, percentile_ranks | Latency, SLOs |
| Unique counts | cardinality | DAU, unique IPs |
| Counting | value_count | Non-null counts |
top_hits — sample docs from the bucket
Technically a metric agg, but very useful. Returns the top N docs from each context — often used as a sub-agg of a bucket to get a sample document per bucket.
{
"aggs": {
"by_category": {
"terms": { "field": "category.keyword" },
"aggs": {
"highest_priced": {
"top_hits": {
"size": 1,
"sort": [{ "price": "desc" }],
"_source": ["title", "price"]
}
}
}
}
}
}
Gets us “the most expensive product in each category”. Hugely useful for analytics dashboards.
Filtering and metrics
Just like bucket aggs, metric aggs run on the query result. Want “average order value for premium customers”?
GET /orders/_search
{
"size": 0,
"query": {
"term": { "customer_tier": "premium" }
},
"aggs": {
"avg_order": { "avg": { "field": "amount" } }
}
}
Quick rules
- One metric needed? Use the specific agg. Several? Use
stats(one pass). - Latency/distribution reporting? Always percentiles, never just avg.
- Distinct counts?
cardinality— accept the approximation, it’s the price of scale. - Need a sample doc per bucket?
top_hitssub-agg. - Metrics respect the outer query — combine query + metric for “X about my filtered data”.