Metric Aggregations - Elasticsearch

If bucket aggs are GROUP BY, metric aggs are everything inside SELECT — SUM, AVG, COUNT(DISTINCT), etc. In simple language, they compute a single number (or a few numbers) from a set of docs.

Single-value metrics

The basic family — give a field, get a number.

GET /orders/_search
{
  "size": 0,
  "aggs": {
    "total_revenue": { "sum":   { "field": "amount" } },
    "avg_order":     { "avg":   { "field": "amount" } },
    "smallest":      { "min":   { "field": "amount" } },
    "biggest":       { "max":   { "field": "amount" } },
    "order_count":   { "value_count": { "field": "amount" } }
  }
}

value_count is like SQL’s COUNT(field) — counts non-null values. Note this is not the same as the bucket’s doc_count (which counts documents regardless of field presence).

Response shape:

{
  "aggregations": {
    "total_revenue": { "value": 145820.50 },
    "avg_order":     { "value": 234.50 },
    "smallest":      { "value": 9.99 },
    "biggest":       { "value": 4999.00 },
    "order_count":   { "value": 622 }
  }
}

stats — all five at once

If we want sum, avg, min, max, count together, there’s a single agg for that:

{
  "aggs": {
    "order_stats": {
      "stats": { "field": "amount" }
    }
  }
}

Returns all five in one shot. Cheaper than running each individually because ES makes a single pass over the data.

For variance + std deviation too, use extended_stats:

{
  "aggs": {
    "detailed": {
      "extended_stats": { "field": "amount" }
    }
  }
}

Percentiles — what the average hides

Averages lie. A site with avg response time of 200ms might have 1% of users seeing 5-second timeouts. Percentiles tell the real story.

GET /requests/_search
{
  "size": 0,
  "aggs": {
    "latency_pct": {
      "percentiles": {
        "field": "response_time_ms",
        "percents": [50, 75, 95, 99, 99.9]
      }
    }
  }
}

Response:

{
  "latency_pct": {
    "values": {
      "50.0":  120,
      "75.0":  180,
      "95.0":  450,
      "99.0":  1200,
      "99.9":  4800
    }
  }
}

Read it like — “50% of requests under 120ms, 99% under 1.2s, 99.9% under 4.8s.” That’s the standard latency reporting in any production system.

percentile_ranks — the inverse

If we know the SLO threshold and want to know what % of requests beat it:

{
  "aggs": {
    "slo": {
      "percentile_ranks": {
        "field": "response_time_ms",
        "values": [500, 1000]
      }
    }
  }
}

Returns “94% of requests under 500ms, 98% under 1000ms”. Use this for SLO dashboards.

Cardinality — approximate distinct count

The ES equivalent of COUNT(DISTINCT field). But here’s the catch — it’s approximate (uses HyperLogLog++).

{
  "aggs": {
    "unique_users": {
      "cardinality": {
        "field": "user_id",
        "precision_threshold": 3000
      }
    }
  }
}

In simple language — precision_threshold is the upper bound below which the count is essentially exact. Above it, error grows but stays small (~1-2% at most). Default is 3000, max is 40000. Higher precision = more memory.

Why approximate? Exact distinct counts require holding every unique value in memory across shards. That doesn’t scale. HyperLogLog uses a clever probabilistic structure that’s tiny in memory and “close enough” — typical error is well under 1% for the default settings.

Metric agg families at a glance

Family	Aggs	Use
Basic numeric	sum, avg, min, max	Sales, counts, ranges
Combined	stats, extended_stats	One-shot overview
Distribution	percentiles, percentile_ranks	Latency, SLOs
Unique counts	cardinality	DAU, unique IPs
Counting	value_count	Non-null counts

top_hits — sample docs from the bucket

Technically a metric agg, but very useful. Returns the top N docs from each context — often used as a sub-agg of a bucket to get a sample document per bucket.

{
  "aggs": {
    "by_category": {
      "terms": { "field": "category.keyword" },
      "aggs": {
        "highest_priced": {
          "top_hits": {
            "size": 1,
            "sort": [{ "price": "desc" }],
            "_source": ["title", "price"]
          }
        }
      }
    }
  }
}

Gets us “the most expensive product in each category”. Hugely useful for analytics dashboards.

Filtering and metrics

Just like bucket aggs, metric aggs run on the query result. Want “average order value for premium customers”?

GET /orders/_search
{
  "size": 0,
  "query": {
    "term": { "customer_tier": "premium" }
  },
  "aggs": {
    "avg_order": { "avg": { "field": "amount" } }
  }
}

Quick rules

One metric needed? Use the specific agg. Several? Use stats (one pass).
Latency/distribution reporting? Always percentiles, never just avg.
Distinct counts? cardinality — accept the approximation, it’s the price of scale.
Need a sample doc per bucket? top_hits sub-agg.
Metrics respect the outer query — combine query + metric for “X about my filtered data”.