Compound Queries & Function Score

advanced elasticsearch query-dsl function-score scoring boost

ES gives us a relevance score via BM25 out of the box. But often “most relevant” needs to factor in business signals — recency, popularity, distance, paid promotion. That’s where compound queries (especially function_score) come in.

What are compound queries?

In simple language — compound queries wrap other queries and modify their behavior. The ones worth knowing:

  • bool — combine clauses (covered separately).
  • constant_score — wrap a filter, give every match the same score.
  • dis_max — “disjunction max” — take the single best score across multiple subqueries.
  • boosting — match docs but demote ones matching a negative query.
  • function_score — modify scores using custom functions.

constant_score — when you don’t care about score

GET /products/_search
{
  "query": {
    "constant_score": {
      "filter": { "term": { "category": "laptops" } },
      "boost": 1.5
    }
  }
}

Every matching doc gets score 1.5. Useful when we want filter-context behavior (cached, no BM25 math) but still need a fixed score for sorting/combination.

boosting — demote, don’t exclude

must_not removes matches entirely. boosting lets us demote them instead.

GET /products/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": { "title": "headphones" }
      },
      "negative": {
        "term": { "refurbished": true }
      },
      "negative_boost": 0.3
    }
  }
}

Refurbished headphones still show up, but their score is multiplied by 0.3 — so they sink to the bottom.

function_score — the powerhouse

This is the one interviewers ask about. function_score wraps a query and applies one or more scoring functions on top of the BM25 score.

GET /products/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": { "title": "headphones" }
      },
      "functions": [
        {
          "filter": { "term": { "is_featured": true } },
          "weight": 2
        },
        {
          "field_value_factor": {
            "field": "rating",
            "factor": 1.2,
            "modifier": "log1p"
          }
        },
        {
          "gauss": {
            "created_at": {
              "origin": "now",
              "scale": "30d",
              "decay": 0.5
            }
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "multiply"
    }
  }
}

Let’s break it down — for each result, ES computes:

  1. The base relevance score from match.
  2. A +2 weight bonus if is_featured: true.
  3. A multiplier based on the rating field (log scale, so 5★ isn’t 5x better than 1★).
  4. A gauss decay — products created recently get a high score, dropping to 0.5 over 30 days.

Then combines them — score_mode: sum sums the function results, boost_mode: multiply multiplies that with the query score.

function_score pipeline
query (BM25)
×
score_mode(fn1, fn2, fn3)
=
final _score
score_mode: multiply | sum | avg | first | max | min
boost_mode: multiply | sum | avg | replace | max | min

Decay functions — gauss, linear, exp

Decay functions are how we “boost recent things” or “boost nearby things.” They’re shaped like:

  • gauss — bell curve, smooth fall-off. Best for most cases.
  • linear — straight line drop to zero at scale + offset.
  • exp — sharp initial drop, long tail.
{
  "gauss": {
    "published_at": {
      "origin": "now",
      "offset": "1d",
      "scale": "7d",
      "decay": 0.5
    }
  }
}

In words — at now, score = 1.0. For 1 day around now, no decay (that’s the offset). After that, decay starts; by 7 days out (the scale), score = 0.5 (the decay target).

This is the pattern for news/feed ranking. Recent posts win, older posts gradually fade.

Geo decay

Same idea for distance:

{
  "gauss": {
    "location": {
      "origin": { "lat": 12.97, "lon": 77.59 },
      "scale": "10km",
      "decay": 0.5
    }
  }
}

Restaurants 10km away score half as much as restaurants right next to us. Beyond that, they fade fast.

field_value_factor — boost by a number field

{
  "field_value_factor": {
    "field": "view_count",
    "factor": 1.0,
    "modifier": "log1p",
    "missing": 1
  }
}

modifier options: none, log, log1p, log2p, ln, ln1p, sqrt, square, reciprocal. Use log1p for view counts/popularity — without it, viral content dominates everything.

script_score — when nothing else fits

{
  "script_score": {
    "script": {
      "source": "doc['rating'].value * Math.log(2 + doc['review_count'].value)"
    }
  }
}

Powerful but slow — scripts run per-document. Use only when the built-in functions can’t express what we need.

Quick rules

  • Recency boost? gauss decay on a date field.
  • Popularity boost? field_value_factor with log1p modifier.
  • Featured/sponsored items? weight function gated by a filter.
  • Don’t reach for script_score until you’ve tried the named functions.
  • Set boost_mode: multiply for proportional boosts, sum for additive.