When we search for “fast laptop”, Elasticsearch returns matching docs sorted by a _score. That score is a number telling us how relevant the doc is. The higher the score, the better the match.
In simple language: scoring is “how strongly does this document match my query, given the words it contains and how common those words are across the whole index”.
TF-IDF (the old way)
Before ES 5, the default scoring used TF-IDF:
- TF (Term Frequency) — the more times a term appears in a doc, the higher the score.
- IDF (Inverse Document Frequency) — rare terms across the index count more. “laptop” matters more than “the”.
- Field length norm — shorter fields score higher (a match in a title beats a match in a long description).
The problem: TF grows unbounded. A doc that mentions “laptop” 100 times scores way higher than one that mentions it 5 times — even though both are obviously about laptops.
BM25 (the current default, since ES 5.0)
BM25 stands for “Best Matching 25”. Think of it like TF-IDF with two important fixes:
- TF saturation — repeating a term gives diminishing returns. After 5–10 occurrences, more mentions barely move the needle.
- Length normalization is tunable — controlled by a parameter
b.
b (default 0.75) — controls length norm. 0 = ignore length, 1 = full normalization.
dl = doc length, avgdl = average doc length in the index.
Tuning BM25 per field
We can override k1 and b on a per-index basis:
PUT /products
{
"settings": {
"index": {
"similarity": {
"custom_bm25": {
"type": "BM25",
"k1": 1.5,
"b": 0.5
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"similarity": "custom_bm25"
}
}
}
}
Debugging scores with explain
When relevance feels off, use explain to see why a doc scored what it did:
GET /products/_search
{
"explain": true,
"query": { "match": { "description": "fast laptop" } }
}
The response includes a breakdown: IDF value, TF value, field length, and the final BM25 product for each term.
When BM25 isn’t enough
BM25 only looks at lexical matches — it has no idea “laptop” and “notebook” mean the same thing. For semantic similarity, we layer on:
- Synonyms at analyzer time (cheap, fast)
- Function score / script_score to boost recent or popular docs
- Dense vector search (kNN) for true semantic matching
For most CRUD-y search interview questions though, “BM25 with TF saturation and length normalization” is the right answer.