Elasticsearch

All 28 notes on one page

Fundamentals

What is Elasticsearch & When to use it

beginner elasticsearch basics search

Elasticsearch is a distributed search and analytics engine. In simple language, it’s a database that’s insanely good at two things: full-text search (“find me all blog posts mentioning wireless headphones”) and aggregations (“group these 10 million log lines by status code in 200ms”).

It’s built on top of Apache Lucene, stores data as JSON documents, and talks to us over a REST API. No SQL, no drivers — just HTTP and JSON.

Why not just use Postgres?

Postgres is great. But try running WHERE description LIKE '%headphones%' on 50 million rows — it’s painful. Postgres doesn’t pre-index every word in your text; Elasticsearch does, using something called an inverted index (we’ll cover that next).

Think of it like this: SQL is built for transactions and exact-match queries. Elasticsearch is built for “search this text, sort by relevance, aggregate the rest.”

SQL (Postgres / MySQL)

- Rows + columns
- ACID transactions
- Joins, foreign keys
- Exact match queries
- LIKE '%word%' is slow

Elasticsearch

- JSON documents
- Eventually consistent
- No joins (denormalize)
- Full-text + relevance
- Aggregations on the fly

MongoDB / NoSQL

- JSON documents
- Flexible schema
- Good for OLTP
- Weak full-text
- Limited aggregations

When to reach for it

The three classic use cases:

Search — product search on Amazon, code search on GitHub, autocomplete suggestions. Anywhere a user types into a box.
Logs & observability — the “E” in the ELK stack (Elasticsearch + Logstash + Kibana). Apps push logs in, we slice and dice them in Kibana.
Analytics — dashboards over big event streams. “Top 10 countries by signup last hour” across billions of events.

When NOT to use it

As your primary source of truth for money/orders/users. It’s eventually consistent and not ACID. Keep transactional data in Postgres, mirror to ES for search.
For relational joins. ES doesn’t really do joins — we denormalize.
For tiny datasets. If you have 10,000 rows, Postgres with a GIN index is plenty.

The mental model

ES = Postgres + Lucene + horizontal scaling, talking JSON over HTTP.

# Index a document
curl -X POST "localhost:9200/products/_doc/1" -H "Content-Type: application/json" -d '
{
  "title": "Sony WH-1000XM5 Wireless Headphones",
  "price": 399,
  "category": "audio"
}'

# Search it
curl -X GET "localhost:9200/products/_search" -H "Content-Type: application/json" -d '
{
  "query": { "match": { "title": "wireless headphones" } }
}'

That’s the whole API surface — PUT, POST, GET, DELETE against indices. We’ll dig into how the magic works in the next notes.

References

Inverted Index

intermediate elasticsearch internals lucene

If there’s one thing to memorize about Elasticsearch, it’s this: it uses an inverted index. In simple language, an inverted index is a map from each word to the list of documents that contain it. That’s it. That’s the secret sauce.

A normal index (like in Postgres) goes “doc 1 has these words.” An inverted index flips it around: “this word lives in docs 1, 7, and 42.” Hence “inverted.”

Why does this matter?

When we search for “wireless headphones”, ES doesn’t scan every document. It looks up “wireless” in the inverted index — gets a list of doc IDs in microseconds. Same for “headphones”. Then it intersects/unions the lists, scores them, and returns the top hits.

Think of it like the index at the back of a textbook. You don’t read the whole book to find “mitochondria” — you flip to the index, see “page 247”, and jump straight there.

Let’s build one by hand

Say we index three documents:

Doc 1: { "title": "Sony wireless headphones with noise cancellation" }
Doc 2: { "title": "Bose noise cancelling headphones" }
Doc 3: { "title": "Apple wireless earbuds" }

ES first runs each title through an analyzer — lowercases, splits on spaces, drops stopwords. Then it builds this:

Inverted Index for the "title" field

Term	Doc IDs	Freq
apple	[3]	1
bose	[2]	1
cancellation	[1]	1
cancelling	[2]	1
earbuds	[3]	1
headphones	[1, 2]	2
noise	[1, 2]	2
sony	[1]	1
wireless	[1, 3]	2

Now when we search wireless headphones:

Look up wireless → [1, 3]
Look up headphones → [1, 2]
Union (OR) → [1, 2, 3]. Intersection (AND) → [1].
Score each hit using TF-IDF / BM25 (how rare is the term, how often does it appear in the doc).

Doc 1 matches both terms — it ranks first.

What gets stored besides doc IDs

For each term, ES also stores:

Term frequency (TF) — how many times the term appears in that doc
Document frequency (DF) — how many docs contain the term overall
Positions — where the term appears (needed for phrase queries like "noise cancelling")
Offsets — character offsets (for highlighting)

These let ES compute a relevance score, not just a yes/no match.

Why “keyword” fields skip this

Important nuance — only text fields get analyzed and tokenized into an inverted index of words. keyword fields are stored as a single term (the whole string). That’s why you can search “wireless headphones” inside a text field but not inside a keyword field. We’ll dig into that in the field types note.

Trade-off

Building the inverted index takes work at write time. That’s why ES isn’t great for high-velocity OLTP writes — you’re paying analysis cost on every document. But reads? Blazing fast.

References

Cluster, Node, Index, Document

beginner elasticsearch basics architecture

Before we can do anything useful, we need the vocabulary. ES has four main nouns that nest inside each other like Russian dolls.

CLUSTER (e.g. "prod-search")

NODE 1

Index: products

{ doc1, doc2, doc3... }

Index: logs-2026

{ doc1, doc2... }

NODE 2

Index: users

{ doc1, doc2... }

NODE 3

replicas...

Going inside-out:

Document

A document is a single JSON object — the smallest unit of data in ES. One product, one log line, one user. That’s it.

{
  "_id": "abc123",
  "_index": "products",
  "_source": {
    "title": "Sony WH-1000XM5",
    "price": 399,
    "in_stock": true
  }
}

Every doc has a unique _id (we provide it or ES generates one) and lives in exactly one index. The actual data is inside _source.

Index

An index is a collection of documents with similar shape. Think of it like a table in SQL — but it’s just a logical grouping. We’d have a products index, a users index, a logs-2026-05-26 index.

Two rules of thumb:

Group similar documents into one index (all products together).
For time-series data (logs, events), use one index per day or week. Easier to drop old data.

Node

A node is a single running Elasticsearch process — basically one machine (or one container). Nodes hold shards (pieces of indices) and do the actual searching.

Nodes have roles:

Master-eligible — can be elected cluster master (manages cluster state)
Data — stores shards, runs queries
Ingest — preprocesses documents before indexing
Coordinating — routes requests (every node does this by default)

In small setups, one node does everything. In production, we separate them out.

Cluster

A cluster is a group of nodes that work together under one name (e.g. prod-search). They gossip with each other, share cluster state, and rebalance shards when nodes join or leave.

One node elects itself master, manages metadata (which shards live where, what mappings exist), and coordinates the rest. If the master dies, the others elect a new one.

# Check cluster health
curl localhost:9200/_cluster/health?pretty

{
  "cluster_name": "prod-search",
  "status": "green",
  "number_of_nodes": 3,
  "active_primary_shards": 12,
  "active_shards": 24
}

Status colors:

Green — all primaries and replicas are assigned. We’re good.
Yellow — primaries OK, some replicas missing. Still works, less safe.
Red — at least one primary is unassigned. Some data is unreachable. Bad day.

The hierarchy in one line

cluster > node > index > shard > document

We’ll get to shards next — that’s where the distributed magic happens.

References

Shards & Replicas

intermediate elasticsearch shards distributed

This is where Elasticsearch’s distributed nature shows up. An index isn’t actually a single thing on disk — it’s split into pieces called shards, and each shard can have copies called replicas.

Why shard?

Imagine we have 500 million product documents. No single machine can hold all that and search it fast. So we split the index into N pieces (shards), and each piece lives on a different node. Now searches run in parallel across nodes.

In simple language, sharding = horizontal scaling for an index.

Primary vs replica

Each shard exists in two flavors:

Primary shard — the original. All writes go here first.
Replica shard — an exact copy of a primary, on a different node. Used for redundancy and read throughput.

Critical rule: a primary and its replica never live on the same node. Otherwise, if that node dies, we lose both. ES enforces this automatically.

Index "products": 3 primary shards, 1 replica each = 6 total shards across 3 nodes

NODE 1

P0 (primary)

R2 (replica of P2)

NODE 2

P1 (primary)

R0 (replica of P0)

NODE 3

P2 (primary)

R1 (replica of P1)

If Node 2 dies → R1 on Node 3 promotes to primary. R0 on Node 2 is lost, ES reassigns it elsewhere.

How a document ends up on a specific shard

ES uses a simple formula:

shard = hash(routing) % number_of_primary_shards

Where routing defaults to the document’s _id. This is why you can’t change number_of_shards after index creation — the hash math would break and existing docs would map to wrong shards.

Replicas are not part of that formula. They just mirror their primary.

Writes vs reads

Write path — request hits any node → coordinator routes to the primary → primary writes locally → forwards to all replicas → replicas ack → primary responds. Synchronous to replicas by default.
Read path — request hits any node → coordinator picks any copy (primary OR replica) for each shard → returns results. This is why more replicas = more read throughput.

PUT /products/_doc/1
{
  "title": "Sony WH-1000XM5"
}

// Goes to: hash("1") % 3 = shard 2's primary → then replicated to R2

Setting shards & replicas

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

number_of_shards — immutable after creation. Plan ahead.
number_of_replicas — can be changed anytime. Bump it up for hot indices, down for cold ones.

Sizing rules of thumb

The internet is full of bad advice here. Elastic’s official recommendations:

Aim for shards between 10 GB and 50 GB each.
Don’t over-shard. Each shard has overhead — file handles, memory, metadata. 1000 shards of 100 MB each is way worse than 10 shards of 10 GB.
For time-series (logs), one daily index with 1 primary shard is often fine for small clusters.
Keep total shards per GB of JVM heap under ~20.

For our 500M product index? Maybe 5 primary shards (~100 GB total / 20 GB each), 1 replica = 10 shards across 4-6 nodes. Solid starting point.

TL;DR

Shards split the data, replicas duplicate the data. Primary takes writes, replicas serve reads and exist for failover. You pick number_of_shards once and live with it.

References

Document Structure

beginner elasticsearch documents metadata

When we get a document back from ES, it’s not just our JSON — it’s wrapped in metadata. Knowing what each field means saves a lot of confusion.

Here’s a real response:

{
  "_index": "products",
  "_id": "abc123",
  "_version": 3,
  "_seq_no": 42,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "title": "Sony WH-1000XM5",
    "price": 399,
    "category": "audio"
  }
}

Let’s break it down.

The metadata fields (prefixed with `_`)

Field	What it is
_index	Which index this doc lives in
_id	Unique identifier within the index
_source	The actual JSON we sent in
_version	Increments on every update (for optimistic concurrency)
_seq_no	Per-shard sequence number, used for safer concurrency control
_primary_term	Counter that bumps when a new primary is elected
_score	Relevance score (only on search results)

`_id` — the document ID

We can provide our own (PUT /products/_doc/abc123) or let ES auto-generate one (POST /products/_doc). Auto-generated IDs are URL-safe Base64 strings like Z6X3kYwBq8....

The _id must be unique within the index. It’s used to route the doc to a shard via hash(_id) % shards.

`_source` — the field that matters most

This is our original JSON, stored verbatim. By default, ES stores it so we can retrieve the doc as we sent it. You CAN disable _source to save disk, but then you can’t:

Reindex into a new mapping
Use update API
Use highlighting

In simple language: don’t disable _source unless you really know what you’re doing.

`_version` and concurrency control

Every update bumps _version. We can use it to prevent lost updates:

PUT /products/_doc/abc123?if_seq_no=42&if_primary_term=1
{
  "title": "Sony WH-1000XM5 — Updated"
}

If the doc was modified by someone else in the meantime (different seq_no), this fails. That’s optimistic concurrency control — like SQL’s WHERE version = ?.

Note: ES used to use _version for this directly. The modern way is _seq_no + _primary_term because it’s safer across primary failovers.

`_type` — the ghost of versions past

You might see old tutorials with URLs like /products/product/abc123. That product was the type, a sub-grouping within an index (think tables within a database).

Types are dead. They were deprecated in 6.x, removed in 8.x. Now every index has one implicit type, accessed via _doc:

# Old (don't do this)
PUT /products/product/abc123

# New
PUT /products/_doc/abc123

Why did they kill it? Lucene stores all fields from all types in the same underlying index — so two types in the same index with a name field of different data types caused chaos. Easier to just say “one index, one schema.”

Putting it together

# Index a doc with our own ID
curl -X PUT "localhost:9200/products/_doc/sony-xm5" -H "Content-Type: application/json" -d '
{
  "title": "Sony WH-1000XM5",
  "price": 399
}'

# Get it back
curl "localhost:9200/products/_doc/sony-xm5"

{
  "_index": "products",
  "_id": "sony-xm5",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true,
  "_source": { "title": "Sony WH-1000XM5", "price": 399 }
}

When you see found: true and your data in _source, you’ve got it. Everything else is plumbing.

References

Indexing & Mapping

Index Creation & Settings

intermediate elasticsearch indexing settings

You CAN just throw documents at ES and let it auto-create the index. But that’s a great way to end up with 1 shard, dynamic mapping nightmares, and 3 AM pages. Let’s do it properly.

The two halves of index config

Every index has two configuration blocks:

settings — physical/operational stuff. How many shards? How many replicas? Refresh interval? Custom analyzers?
mappings — schema. What fields exist? What types are they? How should text be analyzed?

In simple language: settings is “how the index runs”, mappings is “what the data looks like.”

settings

- number_of_shards (immutable)
- number_of_replicas (mutable)
- refresh_interval
- analysis (analyzers, tokenizers)
- max_result_window
- codec (compression)

mappings

- field names
- field data types (text, keyword, long...)
- which analyzer to use per text field
- multi-fields (text + keyword)
- dynamic mapping rules

A real index creation request

Let’s build a products index from scratch:

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "5s",
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title":       { "type": "text", "analyzer": "product_analyzer" },
      "sku":         { "type": "keyword" },
      "price":       { "type": "scaled_float", "scaling_factor": 100 },
      "in_stock":    { "type": "boolean" },
      "created_at":  { "type": "date" },
      "category":    { "type": "keyword" },
      "description": {
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword", "ignore_above": 256 }
        }
      }
    }
  }
}

That’s a production-ready starting point. Now let’s unpack the important settings.

`number_of_shards`

How many primary shards to split the index across. Set this at creation time, you can’t change it later. (Well, you can shrink/split with restrictions, but plan to not.)

Default in modern ES: 1. Good for small indices, bad for anything that’ll grow past 50 GB.

`number_of_replicas`

How many copies of each primary to maintain. Default: 1. Bump it up for high-read workloads:

PUT /products/_settings
{ "index": { "number_of_replicas": 2 } }

Setting it to 0 saves disk but loses fault tolerance. Useful during bulk imports — set to 0 to speed up writes, then bump back to 1 when done.

`refresh_interval`

How often new documents become searchable. Default: 1 second. That means after we PUT a doc, there’s up to a 1-second lag before it shows up in searches. ES is near real-time, not real-time.

For bulk indexing, crank it up:

PUT /products/_settings
{ "index": { "refresh_interval": "30s" } }

This trades search freshness for write throughput. Set to -1 to disable refreshing entirely during bulk loads.

`analysis`

This is where we define custom analyzers (we cover analyzers in detail in note 9). You declare them in settings, then reference them by name in mappings.

What you CAN change later

Most settings split into “static” (set once, requires close+reopen to change) vs “dynamic” (change anytime). Examples of dynamic ones:

number_of_replicas
refresh_interval
max_result_window
blocks.read_only

Static ones (like number_of_shards, custom analyzers) require closing the index or reindexing.

A common pattern: close → update → reopen

POST /products/_close
PUT /products/_settings  { ... static changes ... }
POST /products/_open

Adding a new analyzer mid-flight? You’ll need this dance. Better yet, plan analyzers upfront or reindex into a new index with the right settings.

Quick checks

# What's the current config?
GET /products/_settings
GET /products/_mapping

# Stats
GET /products/_stats

The 30-second summary: create your index explicitly with the right number of shards, mappings, and analyzers from day one. Letting ES auto-create everything almost always bites later.

References

Mapping: Dynamic vs Explicit

intermediate elasticsearch mapping schema

Mapping is ES’s word for schema. Which fields exist, what types they are, how they get analyzed. There are two flavors: dynamic (ES guesses) and explicit (we declare).

Dynamic mapping — the prototype mode

If we index a document into a non-existent index, ES will:

Create the index.
Look at each field in the JSON.
Guess the type based on the value.
Save that mapping forever.

POST /products/_doc
{
  "title": "Sony WH-1000XM5",
  "price": 399,
  "in_stock": true,
  "created_at": "2026-05-26T10:30:00Z",
  "tags": ["audio", "wireless"]
}

ES infers:

title → text (with a .keyword sub-field)
price → long
in_stock → boolean
created_at → date
tags → text (with .keyword)

Sounds convenient, right? Until things break.

Where dynamic mapping bites you

- First doc has price: 399 → mapped as long. Next doc has price: 399.99 → rejected.
- First doc has id: "abc123" → mapped as text. Now you can't sort or aggregate on it without pain.
- Field explosion: someone sends { "metadata": { "click_1": ..., "click_2": ... } } dynamically. Your mapping grows by one field per request.
- Dates: "2026-05-26" mapped as date. Then someone sends "May 26 2026" → rejected.

Explicit mapping — the production mode

For anything serious, declare your schema upfront:

PUT /products
{
  "mappings": {
    "properties": {
      "title":      { "type": "text" },
      "sku":        { "type": "keyword" },
      "price":      { "type": "scaled_float", "scaling_factor": 100 },
      "in_stock":   { "type": "boolean" },
      "created_at": { "type": "date", "format": "strict_date_optional_time" },
      "tags":       { "type": "keyword" }
    }
  }
}

Now we have predictable types, we can’t accidentally pollute the mapping, and bad data gets rejected early.

Adding new fields later

Mappings are mostly append-only. We can add a new field, but we can’t change an existing field’s type.

PUT /products/_mapping
{
  "properties": {
    "discount_pct": { "type": "float" }
  }
}

This works. But trying to change title from text to keyword? Nope — you’d have to reindex into a new index with the corrected mapping.

Controlling dynamic behavior

We don’t have to choose all-or-nothing. The dynamic setting has three values:

true (default) — new fields are added to the mapping automatically
false — new fields are stored in _source but NOT indexed (invisible to search)
strict — new fields cause the document to be rejected

PUT /products
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "title": { "type": "text" },
      "sku":   { "type": "keyword" }
    }
  }
}

// This now FAILS:
POST /products/_doc
{ "title": "Sony XM5", "sku": "SONY-001", "random_field": "oops" }
// → strict_dynamic_mapping_exception

Strict mapping is the safest default for production. It forces you to be intentional.

Dynamic templates — the middle ground

Sometimes we want some flexibility — e.g. “any field that ends in _id should be a keyword.” That’s what dynamic templates are for:

PUT /events
{
  "mappings": {
    "dynamic_templates": [
      {
        "ids_as_keyword": {
          "match": "*_id",
          "mapping": { "type": "keyword" }
        }
      },
      {
        "strings_as_keywords": {
          "match_mapping_type": "string",
          "mapping": { "type": "keyword", "ignore_above": 1024 }
        }
      }
    ]
  }
}

Now user_id, session_id, product_id all become keywords automatically, and any other string becomes a keyword too (not text). Super useful for log indices where the shape varies.

The TL;DR

Prototyping locally? Dynamic mapping is fine.
Production? Explicit mapping. Maybe dynamic: strict to catch typos.
Log/event data with unknown shape? Dynamic templates with sensible defaults.
Need to change a field’s type? Reindex into a new index. There’s no in-place ALTER COLUMN.

References

Field Data Types

intermediate elasticsearch mapping types

If an interviewer asks one ES question, it’s “what’s the difference between text and keyword?” Let’s nail that first, then sweep through the other types.

text vs keyword — THE question

In simple language:

text is for human-readable content you want to search inside. ES analyzes it (lowercase, tokenize, etc.) and builds an inverted index of the words.
keyword is for exact-match identifiers and structured tags. ES stores the whole string as one token.

"Sony WH-1000XM5 Wireless Headphones" → indexed as...

type: "text"

→ analyzer runs:

["sony", "wh", "1000xm5", "wireless", "headphones"]

Good for: full-text search, "match" queries.
Can't: sort, aggregate, exact-match by default.

type: "keyword"

→ stored as one token:

["Sony WH-1000XM5 Wireless Headphones"]

Good for: exact match, sorting, aggregations, term filters.
Can't: search inside the string.

The classic gotcha: you index a product title as keyword, then can’t figure out why match: "wireless" returns nothing. Because the index has one giant token, not the words inside it.

The multi-field pattern (use this)

You usually want both. ES supports multi-fields — index the same data two ways:

PUT /products
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword", "ignore_above": 256 }
        }
      }
    }
  }
}

Now title is searchable (full-text) and title.keyword is sortable/aggregatable (exact). This is so common that dynamic mapping does it automatically for any string.

// Full-text search
GET /products/_search
{ "query": { "match": { "title": "wireless" } } }

// Exact match / sort
GET /products/_search
{
  "query": { "term": { "title.keyword": "Sony WH-1000XM5 Wireless Headphones" } },
  "sort": [{ "title.keyword": "asc" }]
}

Numeric types

Pick the smallest one that fits — saves disk and memory.

Type	Range
`byte`	-128 to 127
`short`	-32k to 32k
`integer`	~±2.1 billion
`long`	huge (default for whole numbers)
`float`	32-bit float
`double`	64-bit float
`scaled_float`	float stored as long, e.g. price × 100
`half_float`	16-bit float (when precision doesn’t matter)

For money, prefer scaled_float with scaling_factor: 100. Avoids float weirdness.

Date

ES dates are flexible. Accepts:

ISO 8601 strings: "2026-05-26T10:30:00Z"
epoch millis: 1748254200000
Custom formats via the format parameter

"created_at": {
  "type": "date",
  "format": "strict_date_optional_time||epoch_millis"
}

Internally stored as a long (epoch ms). Use date_nanos if you need nanosecond precision (for traces, etc.).

Boolean

Easy. true / false. Also accepts "true" / "false" as strings.

IP

A dedicated type for IPv4 and IPv6 addresses. Supports CIDR queries:

"client_ip": { "type": "ip" }

GET /logs/_search
{ "query": { "term": { "client_ip": "192.168.1.0/24" } } }

Object — implicit nesting

Any nested JSON is an object type by default:

{
  "user": {
    "name": "Manish",
    "country": "IN"
  }
}

ES flattens this internally to user.name, user.country. We can query as user.name.

Object vs nested — the array trap

Here’s a sneaky one. Object types don’t preserve relationships between array items. Watch this:

{
  "comments": [
    { "author": "alice", "text": "great" },
    { "author": "bob",   "text": "terrible" }
  ]
}

Internally ES flattens this to:

comments.author: ["alice", "bob"]
comments.text:   ["great", "terrible"]

The fact that “alice” said “great” is lost. A query like “author=bob AND text=great” would match this document! Wrong, but ES doesn’t know.

The fix is the nested type. Each object in the array gets indexed as a hidden separate doc, and you query with a nested query:

"comments": {
  "type": "nested",
  "properties": {
    "author": { "type": "keyword" },
    "text":   { "type": "text" }
  }
}

GET /posts/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "term":  { "comments.author": "bob" } },
            { "match": { "comments.text": "great" } }
          ]
        }
      }
    }
  }
}

Trade-off: nested fields are heavier on disk and slightly slower to query. Only use them when array-item relationships actually matter.

Quick reference

text — searchable prose
keyword — exact-match strings, sort, agg
long / integer / scaled_float — numbers
date — timestamps
boolean — true/false
ip — IP addresses
object — implicit, flat nested fields
nested — when arrays of objects need to stay linked
geo_point — lat/lon (covered separately)

Pick types deliberately, and you’ll dodge 90% of “why doesn’t my query work” issues.

References

Analyzers, Tokenizers & Token Filters

intermediate elasticsearch analyzers text

When we index a text field, ES doesn’t just store the raw string — it runs it through an analyzer that chops it into tokens (the words that go into the inverted index). The same analyzer runs at search time on the query string, so they match up.

In simple language: an analyzer is a pipeline that turns “Sony WH-1000XM5 Headphones!” into [sony, wh, 1000xm5, headphones].

The pipeline: three stages

Analyzer Pipeline

Raw text

"Sony WH-1000XM5!"

→

Char filters

strip HTML, replace chars

→

Tokenizer

split into tokens

→

Token filters

lowercase, stem, dedupe

→

Tokens

[sony, wh, 1000xm5]

1. Character filters

Run on the raw string before tokenization. Strip HTML tags, replace patterns, map characters.

html_strip — removes <p> and friends
mapping — custom char-to-char replacements (”&” → ” and ”)
pattern_replace — regex find/replace

2. Tokenizer (exactly one)

Splits the string into tokens. Examples:

standard — splits on word boundaries (Unicode-aware). Most common.
whitespace — splits on whitespace only. Keeps punctuation in tokens.
keyword — doesn’t split. Whole string as one token.
ngram / edge_ngram — generates substrings (for autocomplete).
path_hierarchy — splits /usr/local/bin into /usr, /usr/local, /usr/local/bin.

3. Token filters (any number, in order)

Operate on the token stream. Each one modifies, adds, or removes tokens.

lowercase — almost always wanted
stop — removes “a”, “the”, “is” etc.
stemmer — reduces “running” → “run”, “buying” → “buy”
synonym — “tv” ↔ “television”
asciifolding — “café” → “cafe”
edge_ngram — generates prefixes for autocomplete

Built-in analyzers (don’t reinvent the wheel)

ES ships with several ready-made analyzers:

standard (default) — standard tokenizer + lowercase. Fine for most use cases.
simple — splits on non-letters + lowercase. No numbers preserved.
whitespace — just splits on whitespace, no lowercasing.
keyword — treats the whole input as one token (similar to a keyword field).
english (and other language analyzers) — adds stemming, stopwords, possessives.

Testing analyzers with `_analyze`

This is the killer debugging tool. Run any analyzer against any text:

POST /_analyze
{
  "analyzer": "english",
  "text": "The running headphones are amazing"
}

Response:

{
  "tokens": [
    { "token": "run",      "position": 1 },
    { "token": "headphon", "position": 2 },
    { "token": "amaz",     "position": 4 }
  ]
}

Notice: “The” and “are” dropped (stopwords), “running” stemmed to “run”. This is why match: "run" finds documents containing “running” with the english analyzer.

Defining a custom analyzer

PUT /products
{
  "settings": {
    "analysis": {
      "char_filter": {
        "ampersand_to_and": {
          "type": "mapping",
          "mappings": ["& => and"]
        }
      },
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "char_filter": ["ampersand_to_and", "html_strip"],
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "stop", "english_stemmer"]
        }
      },
      "filter": {
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": { "type": "text", "analyzer": "product_analyzer" }
    }
  }
}

Index-time vs search-time analyzers

By default, the same analyzer runs at both. You CAN specify different ones:

"title": {
  "type": "text",
  "analyzer": "product_analyzer",         // at index time
  "search_analyzer": "standard"           // at query time
}

When does this matter? Autocomplete. Use edge_ngram at index time (generates “s”, “so”, “son”, “sony”), standard at search time (just “son”). Otherwise the search side would explode the query into prefixes too.

The golden rule

If your search returns nothing, run _analyze on both the indexed text AND the query string. The tokens must match. 90% of “ES search isn’t working” issues are analyzer mismatches.

References

Index Templates & Aliases

intermediate elasticsearch templates aliases operations

These two features look small but they’re how real ES deployments avoid downtime. Templates apply config to indices that don’t exist yet. Aliases let us swap indices behind a stable name.

Index Templates — config that auto-applies

Imagine we ship logs to ES, one index per day: logs-2026-05-26, logs-2026-05-27, etc. We don’t want to manually run PUT /logs-... with all the settings every day. Templates solve this.

A template says: “Any new index matching this pattern should get these settings, mappings, and aliases.”

PUT /_index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "priority": 100,
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "refresh_interval": "5s"
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "level":      { "type": "keyword" },
        "message":    { "type": "text" },
        "service":    { "type": "keyword" },
        "user_id":    { "type": "keyword" }
      }
    },
    "aliases": {
      "logs": {}
    }
  }
}

Now when our app POSTs a doc to logs-2026-05-27 (which doesn’t exist yet), ES:

Sees the index doesn’t exist.
Matches logs-* pattern → applies the template.
Creates the index with those settings + mappings + alias.
Indexes the document.

In simple language: templates are “auto-config” for index creation.

Composable templates

Modern ES uses composable templates — you can split settings, mappings, and aliases into reusable component templates and stitch them together. Useful when many index types share base settings.

PUT /_component_template/base_settings
{ "template": { "settings": { "number_of_replicas": 1 } } }

PUT /_index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "composed_of": ["base_settings"],
  "template": { "mappings": { ... } }
}

Aliases — the redirect layer

An alias is a pointer to one or more indices. Apps query the alias, ES routes to the actual index.

App → alias "products" → real index

App code:
GET /products/_search

→

alias: products

→

index: products_v3

Tomorrow we swap the alias to products_v4. App code unchanged.

The blue-green reindex pattern

This is THE killer use case. Say we need to change a field’s type (which isn’t allowed in-place). The dance:

# 1. Create the new index with the new mapping
PUT /products_v2
{ "mappings": { "properties": { "price": { "type": "scaled_float", "scaling_factor": 100 } } } }

# 2. Reindex from old to new
POST /_reindex
{
  "source": { "index": "products_v1" },
  "dest":   { "index": "products_v2" }
}

# 3. Atomically swap the alias
POST /_aliases
{
  "actions": [
    { "remove": { "index": "products_v1", "alias": "products" } },
    { "add":    { "index": "products_v2", "alias": "products" } }
  ]
}

# 4. Drop the old index
DELETE /products_v1

The alias swap is atomic. Apps never see a moment where products doesn’t exist. Zero downtime mapping change.

Aliases for log rotation

Combine aliases with daily indices:

POST /_aliases
{
  "actions": [
    { "add": { "index": "logs-2026-05-26", "alias": "logs", "is_write_index": false } },
    { "add": { "index": "logs-2026-05-27", "alias": "logs", "is_write_index": true } }
  ]
}

logs alias points to BOTH indices.
Searches across the alias hit both.
Writes go only to today’s index (is_write_index: true).

App searches logs, ES handles the routing. Kibana dashboards keep working as new indices come online.

Filtered aliases

We can attach a filter to an alias — useful for multi-tenancy:

POST /_aliases
{
  "actions": [{
    "add": {
      "index": "products",
      "alias": "products_us",
      "filter": { "term": { "country": "us" } }
    }
  }]
}

Now anyone querying products_us only sees US products. No app-side filter needed.

Data streams — the modern way for logs

For pure append-only time-series (logs, metrics), modern ES has data streams. Think of a data stream as a managed alias + auto-rollover + naming convention all rolled into one. They’re paired with ILM (Index Lifecycle Management) to auto-rollover when an index hits a size/age threshold.

For interview purposes: know that templates + aliases are the foundation; data streams are the syntactic sugar on top.

TL;DR

Index templates — auto-apply settings/mappings to new indices matching a pattern.
Aliases — stable name pointing to one or more real indices.
Blue-green reindex — alias swap = zero-downtime mapping change.
Log rotation — daily indices behind one alias with is_write_index.

These two patterns together are how you run ES at scale without 3 AM panic.

References

Query DSL

Match vs Term Query

intermediate elasticsearch query-dsl match term

This is the most asked Elasticsearch question. Get it wrong and the interviewer immediately knows we’ve never actually used ES in production.

In simple language: match runs our search text through the same analyzer that indexed the field, then looks for the resulting tokens. term skips the analyzer entirely and searches for the exact value as-is.

That single difference explains 90% of the “why isn’t my query returning results?” bugs we see.

The flow — what actually happens

When we index a document with "title": "MacBook Pro 16-inch", the standard analyzer lowercases and tokenizes it into ["macbook", "pro", "16", "inch"]. Those tokens go into the inverted index.

Now if we search:

match: "MacBook" → analyzer turns it into ["macbook"] → matches the token → hit.
term: "MacBook" → looks for the literal string MacBook in the index → no match (because the index only has macbook).
term: "macbook" → matches the token → hit.

MATCH query

Input: "MacBook Pro"

↓ analyzer (lowercase + tokenize)

Tokens: ["macbook", "pro"]

↓ search inverted index

→ matches docs with either token

TERM query

Input: "MacBook Pro"

↓ no analyzer, raw bytes

Token: "MacBook Pro"

↓ search inverted index

→ no match (index has lowercase tokens)

When to use which

Think of it like this — use match for human-typed search (search bars, autocomplete) and term for structured/exact data (status flags, IDs, tags, enums).

GET /products/_search
{
  "query": {
    "match": { "title": "wireless bluetooth headphones" }
  }
}

The above is perfect for a search bar. ES will tokenize, lowercase, and find docs containing any of those words (with relevance scoring).

GET /orders/_search
{
  "query": {
    "term": { "status": "shipped" }
  }
}

This is perfect for filtering by a known enum value. We know status is always one of pending | shipped | delivered, so we don’t want fuzzy matching.

The keyword field trick

Here’s where most people get burned. By default, a string field gets mapped as both text (analyzed) AND keyword (not analyzed). To do an exact match on a name field:

GET /users/_search
{
  "query": {
    "term": { "name.keyword": "Manish Prajapati" }
  }
}

Notice the .keyword suffix. Without it, term on a text field will almost never work because the original string isn’t in the index — only its tokens are.

Scoring difference

match queries are run in query context — they compute a relevance score (BM25). term queries can run in filter context (inside bool.filter) where they’re cached and don’t compute scores. That makes term filters dramatically faster for repeated queries.

GET /products/_search
{
  "query": {
    "bool": {
      "must":   [{ "match": { "title": "laptop" } }],
      "filter": [{ "term":  { "in_stock": true } }]
    }
  }
}

Quick rules of thumb

Searching free text from a user? match.
Filtering by status, ID, boolean, tag, enum? term.
Got results that don’t make sense? Check if the field is text or keyword — that’s almost always the culprit.
Want exact-match on a string? Use term on field.keyword.

References

Bool Query

intermediate elasticsearch query-dsl bool filter-context

In simple language, bool is how we combine multiple conditions in a single query. Think of it like SQL’s WHERE a AND b AND NOT c OR d — but with a twist: each clause type changes whether the result affects the relevance score or not.

Almost every real-world ES query is a bool query under the hood. Master this and we’ve mastered Query DSL.

The four clauses

Clause	SQL equivalent	Scoring?	Cached?
`must`	`AND`	Yes (contributes to score)	No
`should`	`OR` (or boost)	Yes (contributes to score)	No
`must_not`	`NOT`	No	Yes
`filter`	`AND`	No	Yes

The key insight — must and filter do the same logical thing (both require a match). The only difference is must computes a relevance score, filter doesn’t. Filter is cached and skips scoring math, so it’s significantly faster.

Query context vs Filter context

QUERY CONTEXT (must, should)

• Computes _score (BM25)

• Not cached

• "How well does this match?"

Use for: search relevance

FILTER CONTEXT (filter, must_not)

• No scoring (score = 0)

• Cached in bitset

• "Does this match — yes/no?"

Use for: exact filters, ranges

The canonical example

Let’s say we’re building a product search. The user typed “wireless headphones” and selected filters: in stock, price under 200, brand is Sony or Bose.

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "wireless headphones" } }
      ],
      "filter": [
        { "term":  { "in_stock": true } },
        { "range": { "price": { "lte": 200 } } }
      ],
      "should": [
        { "term": { "brand": "sony" } },
        { "term": { "brand": "bose" } }
      ],
      "must_not": [
        { "term": { "discontinued": true } }
      ]
    }
  }
}

Breaking it down:

must — the title MUST match “wireless headphones” (and this drives relevance).
filter — must be in stock AND price ≤ 200 (no score impact, cached).
should — bonus points if brand is sony or bose (boosts score, doesn’t exclude others).
must_not — exclude discontinued products.

The “should” gotcha — minimum_should_match

By default, if a bool query has no must or filter clauses, then at least one should clause must match. If must or filter exists, should becomes a pure score booster — it doesn’t have to match anything.

To force at least N should-clauses to match:

{
  "bool": {
    "should": [
      { "term": { "tags": "premium" } },
      { "term": { "tags": "featured" } },
      { "term": { "tags": "bestseller" } }
    ],
    "minimum_should_match": 2
  }
}

Now at least 2 of those tags must match. Super useful for “match any 2 of these criteria” logic.

Why filter is faster

Two reasons:

No scoring math — BM25 calculations aren’t cheap. Skipping them saves CPU.
Bitset caching — ES caches the set of matching doc IDs as a bitmap. Next time we filter in_stock: true, it’s a lookup, not a search.

Rule of thumb — if we don’t care about relevance for a clause, put it in filter. The classic mistake is using must for exact filters like booleans, dates, and IDs.

must_not is also a filter

must_not runs in filter context too — it’s cached and doesn’t affect scoring. Use it freely for exclusions.

{
  "bool": {
    "filter": [
      { "term": { "status": "active" } }
    ],
    "must_not": [
      { "term":  { "is_test_account": true } },
      { "range": { "deleted_at": { "exists": true } } }
    ]
  }
}

Quick rules

User-typed text → must (we want scoring).
Exact filters (booleans, IDs, ranges, dates) → filter.
Optional boosts → should.
Exclusions → must_not.
Need “at least N of these” → should + minimum_should_match.

References

Range, Exists, Wildcard, Prefix & Regex Queries

intermediate elasticsearch query-dsl range wildcard regex

These are the term-level queries we reach for when match and term aren’t enough. They all skip the analyzer and work on raw terms — so they’re typically used on keyword, numeric, or date fields.

Range query

In simple language — “find docs where this field is between X and Y.” Works on numbers, dates, and even strings (lexicographic).

GET /orders/_search
{
  "query": {
    "range": {
      "total_amount": {
        "gte": 100,
        "lt": 500
      }
    }
  }
}

Operators: gt (greater than), gte (greater or equal), lt (less than), lte (less or equal).

Date math

ES supports a mini date-math language. Super handy for relative time queries like “orders from the last 7 days”:

GET /orders/_search
{
  "query": {
    "range": {
      "created_at": {
        "gte": "now-7d/d",
        "lt":  "now/d"
      }
    }
  }
}

now-7d/d means “7 days ago, rounded down to the start of the day”. The /d rounding makes the query cacheable (because rounded values change less often).

Exists query

In simple language — “find docs where this field has a value.” It’s the ES equivalent of SQL’s IS NOT NULL.

GET /users/_search
{
  "query": {
    "exists": { "field": "phone_number" }
  }
}

To find docs where the field is missing, wrap it in must_not:

GET /users/_search
{
  "query": {
    "bool": {
      "must_not": [
        { "exists": { "field": "phone_number" } }
      ]
    }
  }
}

A field is considered “missing” if it’s null, [], or simply not present in the source.

Wildcard query

Pattern matching with * (any characters) and ? (single character). Works on keyword fields.

GET /users/_search
{
  "query": {
    "wildcard": {
      "email.keyword": { "value": "*@gmail.com" }
    }
  }
}

⚠ Performance warning

Leading wildcards like *gmail.com are extremely slow.

ES has to scan every term in the inverted index — no shortcuts possible.

For suffix search at scale, index a reversed copy of the field and do a prefix query on it.

Prefix query

Faster cousin of wildcard — looks for terms starting with a given string. Used for “starts with” filters.

GET /products/_search
{
  "query": {
    "prefix": {
      "sku.keyword": { "value": "APPL-" }
    }
  }
}

This is way faster than wildcard: "APPL-*" because ES can use the inverted index’s sorted structure to jump to the prefix range.

For real autocomplete, prefer the completion suggester or an edge-ngram analyzer over prefix queries — they’re built for that use case.

Regex query

Full regular expressions on keyword fields. Powerful but slow — use sparingly.

GET /logs/_search
{
  "query": {
    "regexp": {
      "user_agent.keyword": "Mozilla.*Chrome/[0-9]+.*"
    }
  }
}

ES uses a flavor of regex (Lucene’s regex syntax), not Perl/PCRE. No lookahead, no backreferences. Anchors ^ and $ are implicit — the entire term must match.

When to use which

Use case	Query	Speed
Number/date between X and Y	range	Fast
Field has a value	exists	Fast
"starts with X"	prefix	Fast
"contains pattern X*Y"	wildcard	Medium
Complex pattern	regexp	Slow

All of these are term-level, so put them in filter context whenever scoring doesn’t matter. That’s a free 2-10x speedup.

Quick rules

All these queries skip the analyzer — use them on keyword fields or numeric/date fields.
Range with date math + rounding (now-1d/d) is cacheable. No rounding = no cache.
Avoid leading wildcards. They scan the entire index.
Regex looks cool in interviews — but in production, prefer prefix + a smarter mapping.

References

Fuzzy & Multi-match Queries

intermediate elasticsearch query-dsl fuzzy multi-match levenshtein

Real users type “elasitcsearch” instead of “elasticsearch”. And they expect us to know they meant the latter. These two queries solve the “typos and multiple fields” problems.

Fuzzy query — handling typos

In simple language — “find docs where the term is almost equal to my search term, allowing for N character edits.” That’s Levenshtein distance.

Levenshtein distance = number of single-character edits (insertions, deletions, substitutions) needed to turn one string into another. cat → bat is distance 1. cat → bats is distance 2.

GET /products/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "elasitcsearch",
        "fuzziness": "AUTO"
      }
    }
  }
}

fuzziness options:

0 — exact match (no fuzziness, same as term).
1 or 2 — explicit edit distance.
AUTO — smart default based on term length:
- 0-2 chars → 0 edits (must be exact)
- 3-5 chars → 1 edit
- 6+ chars → 2 edits

AUTO is what we want 99% of the time. Short terms shouldn’t allow typos (too many false positives).

Fuzzy inside match

The fuzzy query operates on a single term. For multi-word fuzzy search, use match with fuzziness:

GET /products/_search
{
  "query": {
    "match": {
      "title": {
        "query": "wirless headfones",
        "fuzziness": "AUTO"
      }
    }
  }
}

This finds “wireless headphones” even with two typos. Each word gets its own fuzziness allowance.

Levenshtein distance examples

"cat" → "cat" = 0 (identical)

"cat" → "bat" = 1 (sub c→b)

"cat" → "cats" = 1 (insert s)

"cat" → "ct" = 1 (delete a)

"cat" → "dogs" = 4 (too far)

Multi-match query — search across fields

In simple language — “search this text across multiple fields at once.” It’s the realistic version of any search bar that searches title + description + tags + author all together.

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "wireless headphones",
      "fields": ["title", "description", "tags"]
    }
  }
}

Field boosting

We rarely want all fields treated equally. A match in the title should count more than a match in the description. Use ^N suffix:

{
  "multi_match": {
    "query": "wireless headphones",
    "fields": ["title^3", "description^1", "tags^2"]
  }
}

Now a match in title scores 3x more than the same match in description. This dramatically improves relevance for real search bars.

Multi-match types

The type parameter controls how scores from different fields are combined:

Type	What it does	When to use
`best_fields` (default)	Use the score of the single best matching field	Most search bars
`most_fields`	Sum scores across all matching fields	When the same text in multiple fields is a stronger signal
`cross_fields`	Treat fields as one big field (good for names split into first/last)	People search, addresses
`phrase`	Each field tried as a phrase match	Exact phrase across fields
`phrase_prefix`	Phrase match with prefix on last term	Autocomplete

{
  "multi_match": {
    "query": "manish prajapati",
    "type": "cross_fields",
    "fields": ["first_name", "last_name"],
    "operator": "and"
  }
}

cross_fields is perfect here — “manish” matches first_name, “prajapati” matches last_name, but together they should look like a single match.

Combining with fuzziness

We can do both — multi-field AND typo-tolerant:

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "wirless headfones",
      "fields": ["title^3", "description"],
      "fuzziness": "AUTO"
    }
  }
}

This is the “kitchen sink” search bar query. Multi-field, weighted, typo-tolerant.

Performance note

Fuzzy queries are expensive — ES has to compute edit distances against many terms in the index. Two safeguards:

prefix_length — number of leading characters that must match exactly (default 0). Setting prefix_length: 1 or 2 massively narrows the candidate set.
max_expansions — caps the number of terms the query expands to (default 50).

{
  "fuzzy": {
    "title": {
      "value": "elasticsearch",
      "fuzziness": "AUTO",
      "prefix_length": 2,
      "max_expansions": 100
    }
  }
}

Quick rules

User-facing search bars → multi_match + field boosts + fuzziness: AUTO. Best UX.
People/name search → cross_fields type.
Autocomplete → phrase_prefix type (or a dedicated suggester).
Short queries (< 3 chars) — disable fuzziness, too noisy.
Always set prefix_length: 1 in production fuzzy queries for performance.

References

Compound Queries & Function Score

advanced elasticsearch query-dsl function-score scoring boost

ES gives us a relevance score via BM25 out of the box. But often “most relevant” needs to factor in business signals — recency, popularity, distance, paid promotion. That’s where compound queries (especially function_score) come in.

What are compound queries?

In simple language — compound queries wrap other queries and modify their behavior. The ones worth knowing:

bool — combine clauses (covered separately).
constant_score — wrap a filter, give every match the same score.
dis_max — “disjunction max” — take the single best score across multiple subqueries.
boosting — match docs but demote ones matching a negative query.
function_score — modify scores using custom functions.

constant_score — when you don’t care about score

GET /products/_search
{
  "query": {
    "constant_score": {
      "filter": { "term": { "category": "laptops" } },
      "boost": 1.5
    }
  }
}

Every matching doc gets score 1.5. Useful when we want filter-context behavior (cached, no BM25 math) but still need a fixed score for sorting/combination.

boosting — demote, don’t exclude

must_not removes matches entirely. boosting lets us demote them instead.

GET /products/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": { "title": "headphones" }
      },
      "negative": {
        "term": { "refurbished": true }
      },
      "negative_boost": 0.3
    }
  }
}

Refurbished headphones still show up, but their score is multiplied by 0.3 — so they sink to the bottom.

function_score — the powerhouse

This is the one interviewers ask about. function_score wraps a query and applies one or more scoring functions on top of the BM25 score.

GET /products/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": { "title": "headphones" }
      },
      "functions": [
        {
          "filter": { "term": { "is_featured": true } },
          "weight": 2
        },
        {
          "field_value_factor": {
            "field": "rating",
            "factor": 1.2,
            "modifier": "log1p"
          }
        },
        {
          "gauss": {
            "created_at": {
              "origin": "now",
              "scale": "30d",
              "decay": 0.5
            }
          }
        }
      ],
      "score_mode": "sum",
      "boost_mode": "multiply"
    }
  }
}

Let’s break it down — for each result, ES computes:

The base relevance score from match.
A +2 weight bonus if is_featured: true.
A multiplier based on the rating field (log scale, so 5★ isn’t 5x better than 1★).
A gauss decay — products created recently get a high score, dropping to 0.5 over 30 days.

Then combines them — score_mode: sum sums the function results, boost_mode: multiply multiplies that with the query score.

function_score pipeline

query (BM25)

score_mode(fn1, fn2, fn3)

final _score

score_mode: multiply | sum | avg | first | max | min
boost_mode: multiply | sum | avg | replace | max | min

Decay functions — gauss, linear, exp

Decay functions are how we “boost recent things” or “boost nearby things.” They’re shaped like:

gauss — bell curve, smooth fall-off. Best for most cases.
linear — straight line drop to zero at scale + offset.
exp — sharp initial drop, long tail.

{
  "gauss": {
    "published_at": {
      "origin": "now",
      "offset": "1d",
      "scale": "7d",
      "decay": 0.5
    }
  }
}

In words — at now, score = 1.0. For 1 day around now, no decay (that’s the offset). After that, decay starts; by 7 days out (the scale), score = 0.5 (the decay target).

This is the pattern for news/feed ranking. Recent posts win, older posts gradually fade.

Geo decay

Same idea for distance:

{
  "gauss": {
    "location": {
      "origin": { "lat": 12.97, "lon": 77.59 },
      "scale": "10km",
      "decay": 0.5
    }
  }
}

Restaurants 10km away score half as much as restaurants right next to us. Beyond that, they fade fast.

field_value_factor — boost by a number field

{
  "field_value_factor": {
    "field": "view_count",
    "factor": 1.0,
    "modifier": "log1p",
    "missing": 1
  }
}

modifier options: none, log, log1p, log2p, ln, ln1p, sqrt, square, reciprocal. Use log1p for view counts/popularity — without it, viral content dominates everything.

script_score — when nothing else fits

{
  "script_score": {
    "script": {
      "source": "doc['rating'].value * Math.log(2 + doc['review_count'].value)"
    }
  }
}

Powerful but slow — scripts run per-document. Use only when the built-in functions can’t express what we need.

Quick rules

Recency boost? gauss decay on a date field.
Popularity boost? field_value_factor with log1p modifier.
Featured/sponsored items? weight function gated by a filter.
Don’t reach for script_score until you’ve tried the named functions.
Set boost_mode: multiply for proportional boosts, sum for additive.

References

Full-text vs Term-level Queries — When to use which

intermediate elasticsearch query-dsl full-text term-level

Once we understand the difference between match and term, we can generalize it to two whole families of queries — full-text and term-level. Knowing which family to reach for is half the battle.

The mental model

In simple language — full-text queries are for human language, term-level queries are for structured data.

FULL-TEXT QUERIES

Analyzed → tokens → search

For: text, prose, search bars

• match

• match_phrase

• multi_match

• query_string

• simple_query_string

• match_phrase_prefix

• intervals

TERM-LEVEL QUERIES

No analysis → exact terms

For: keywords, IDs, numbers, dates

• term / terms

• range

• exists

• prefix

• wildcard

• regexp

• fuzzy

• ids

The field type connection

This isn’t an arbitrary choice — it’s tied to the field mapping.

text fields are analyzed. Full-text queries work here.
keyword, numeric, date, boolean, IP fields are not analyzed. Term-level queries work here.

When we index a string with default mapping, ES creates both:

"product_name": {
  "type": "text",
  "fields": {
    "keyword": { "type": "keyword", "ignore_above": 256 }
  }
}

So we can do match on product_name (analyzed) AND term on product_name.keyword (exact). This dual mapping is why .keyword shows up everywhere.

A real-world example combining both

Search bar query: “user typed ‘macbook’ and selected category=laptops, price under 2000, in stock.”

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "macbook" } }
      ],
      "filter": [
        { "term":  { "category.keyword": "laptops" } },
        { "term":  { "in_stock": true } },
        { "range": { "price": { "lt": 2000 } } }
      ]
    }
  }
}

Notice the split — the natural-language part uses match (full-text, scored), the structured filters use term/range (term-level, cached, no score). That’s the standard production pattern.

Common mistakes

1. Using match on a status enum

{ "match": { "status": "shipped" } }

Works, but it’s analyzed — "shipped" gets lowercased and tokenized. If we ever index a status like "Partially Shipped", match: "shipped" will match it (wrong). Use term on a keyword field.

2. Using term on a text field

{ "term": { "title": "MacBook Pro" } }

Almost guaranteed to return nothing. The index has tokens like ["macbook", "pro"], not the literal string "MacBook Pro". Use term on title.keyword, or switch to match.

3. Using match_phrase when match would do

{ "match_phrase": { "title": "blue shoes" } }

match_phrase requires the exact word order — "shoes that are blue" won’t match. Sometimes that’s what we want, but most of the time match is more forgiving and gives better recall.

When to use the rarer ones

match_phrase — when word order matters. “Star Wars” should NOT match docs that contain just “star” and “wars” separately.
match_phrase_prefix — autocomplete. “star wa” matches “star wars”.
query_string — power-user search syntax (+required -excluded "phrase"). Powerful but exposes Lucene syntax to users — risky for public-facing search.
simple_query_string — safer subset of query_string. Invalid syntax doesn’t throw an error.
terms (plural) — exact match against a list. Like SQL IN: { "terms": { "category": ["laptops", "tablets"] } }.

Scoring vs filtering — second axis

There’s a second decision orthogonal to full-text vs term-level — query context vs filter context.

	Query context	Filter context
Full-text	`match` inside `must`/`should`	Possible but rare (use `match` inside `filter` if no scoring needed)
Term-level	`term` inside `must` (rare)	`term` inside `filter` ← the common case

The takeaway — full-text usually goes in query context (must/should), term-level usually goes in filter context (filter/must_not).

Quick rules

Searching prose → full-text family (match, multi_match).
Filtering structured data → term-level family (term, range, exists).
Exact string match → term on .keyword subfield.
Phrase search → match_phrase.
User-typed search bar with multiple filters → bool with must: [match] + filter: [term, range]. The standard pattern.

References

Aggregations

Bucket Aggregations

intermediate elasticsearch aggregations buckets

Aggregations are how ES does analytics. They come in three flavors — bucket (group docs), metric (compute numbers), and pipeline (operate on other aggs). This note covers bucket aggs.

In simple language — bucket aggs are like SQL’s GROUP BY. They split docs into groups based on some criterion, and we can then run metrics on each group.

Terms aggregation — the workhorse

Group docs by the unique values of a field. Like GROUP BY category.

GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category.keyword",
        "size": 10
      }
    }
  }
}

size: 0 at the top means “don’t return docs, just aggregations” — saves bandwidth. The agg result looks like:

{
  "aggregations": {
    "by_category": {
      "buckets": [
        { "key": "laptops", "doc_count": 142 },
        { "key": "phones",  "doc_count": 98  },
        { "key": "tablets", "doc_count": 47  }
      ]
    }
  }
}

The size + accuracy gotcha

size: 10 returns top 10 buckets. But ES is distributed — each shard returns its top-10, then results merge. This means the global top-10 might be slightly off for skewed data.

To improve accuracy at a cost, bump shard_size:

{
  "terms": {
    "field": "category.keyword",
    "size": 10,
    "shard_size": 100
  }
}

Each shard returns top 100, we keep top 10. Trade more network/CPU for accuracy.

Date histogram — time-series bucketing

Group docs by time buckets. The bread and butter of dashboards.

GET /orders/_search
{
  "size": 0,
  "aggs": {
    "orders_per_day": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day"
      }
    }
  }
}

Calendar intervals — minute, hour, day, week, month, quarter, year. These respect calendar boundaries (e.g., months have variable lengths).

For fixed intervals (always the same number of milliseconds), use fixed_interval:

{ "date_histogram": { "field": "created_at", "fixed_interval": "30m" } }

Use fixed_interval for sub-day buckets (15m, 30m, 1h), calendar_interval for day/week/month.

Range aggregation — custom numeric buckets

GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_brackets": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 500 },
          { "from": 500, "to": 1000 },
          { "from": 1000 }
        ]
      }
    }
  }
}

Result groups: < $100, $100-500, $500-1000, > $1000. Perfect for e-commerce price filters.

Filters aggregation — named arbitrary buckets

When buckets don’t follow a single rule, define them as named filters:

GET /logs/_search
{
  "size": 0,
  "aggs": {
    "by_status": {
      "filters": {
        "filters": {
          "errors":   { "range": { "status_code": { "gte": 500 } } },
          "warnings": { "range": { "status_code": { "gte": 400, "lt": 500 } } },
          "success":  { "range": { "status_code": { "gte": 200, "lt": 300 } } }
        }
      }
    }
  }
}

This gives us 3 named buckets — errors, warnings, success — each defined by its own filter. More flexible than range/terms when buckets cross fields.

Histogram — numeric bucketing

Like date_histogram but for numbers. Useful for distribution charts.

{
  "aggs": {
    "rating_distribution": {
      "histogram": {
        "field": "rating",
        "interval": 1
      }
    }
  }
}

Buckets at intervals of 1 — 1.0, 2.0, 3.0, 4.0, 5.0. Plot it as a bar chart and we have a star-rating histogram.

Visualizing the structure

Documents → Buckets

[doc1, doc2, doc3, doc4, doc5, doc6, doc7, doc8]

↓ terms agg on "category"

laptops

doc1, doc4, doc7

doc_count: 3

phones

doc2, doc5, doc8

doc_count: 3

tablets

doc3, doc6

doc_count: 2

Combining with queries

Aggs run on the query result set. So:

GET /orders/_search
{
  "size": 0,
  "query": {
    "range": { "created_at": { "gte": "now-30d/d" } }
  },
  "aggs": {
    "orders_per_day": {
      "date_histogram": { "field": "created_at", "calendar_interval": "day" }
    }
  }
}

This gives us “orders per day, last 30 days”. The query filters first, the agg buckets the survivors.

Quick rules

size: 0 saves bandwidth when we only want aggs.
terms on a text field requires .keyword subfield (or fielddata: true, which is memory-heavy).
Top-N from terms is approximate across shards. Increase shard_size if accuracy matters.
date_histogram for time, histogram for numbers, range for custom numeric brackets, filters for arbitrary named buckets.
Aggs operate on the queried subset — combine query + agg for “stats about my filtered data”.

References

Metric Aggregations

intermediate elasticsearch aggregations metrics

If bucket aggs are GROUP BY, metric aggs are everything inside SELECT — SUM, AVG, COUNT(DISTINCT), etc. In simple language, they compute a single number (or a few numbers) from a set of docs.

Single-value metrics

The basic family — give a field, get a number.

GET /orders/_search
{
  "size": 0,
  "aggs": {
    "total_revenue": { "sum":   { "field": "amount" } },
    "avg_order":     { "avg":   { "field": "amount" } },
    "smallest":      { "min":   { "field": "amount" } },
    "biggest":       { "max":   { "field": "amount" } },
    "order_count":   { "value_count": { "field": "amount" } }
  }
}

value_count is like SQL’s COUNT(field) — counts non-null values. Note this is not the same as the bucket’s doc_count (which counts documents regardless of field presence).

Response shape:

{
  "aggregations": {
    "total_revenue": { "value": 145820.50 },
    "avg_order":     { "value": 234.50 },
    "smallest":      { "value": 9.99 },
    "biggest":       { "value": 4999.00 },
    "order_count":   { "value": 622 }
  }
}

stats — all five at once

If we want sum, avg, min, max, count together, there’s a single agg for that:

{
  "aggs": {
    "order_stats": {
      "stats": { "field": "amount" }
    }
  }
}

Returns all five in one shot. Cheaper than running each individually because ES makes a single pass over the data.

For variance + std deviation too, use extended_stats:

{
  "aggs": {
    "detailed": {
      "extended_stats": { "field": "amount" }
    }
  }
}

Percentiles — what the average hides

Averages lie. A site with avg response time of 200ms might have 1% of users seeing 5-second timeouts. Percentiles tell the real story.

GET /requests/_search
{
  "size": 0,
  "aggs": {
    "latency_pct": {
      "percentiles": {
        "field": "response_time_ms",
        "percents": [50, 75, 95, 99, 99.9]
      }
    }
  }
}

Response:

{
  "latency_pct": {
    "values": {
      "50.0":  120,
      "75.0":  180,
      "95.0":  450,
      "99.0":  1200,
      "99.9":  4800
    }
  }
}

Read it like — “50% of requests under 120ms, 99% under 1.2s, 99.9% under 4.8s.” That’s the standard latency reporting in any production system.

percentile_ranks — the inverse

If we know the SLO threshold and want to know what % of requests beat it:

{
  "aggs": {
    "slo": {
      "percentile_ranks": {
        "field": "response_time_ms",
        "values": [500, 1000]
      }
    }
  }
}

Returns “94% of requests under 500ms, 98% under 1000ms”. Use this for SLO dashboards.

Cardinality — approximate distinct count

The ES equivalent of COUNT(DISTINCT field). But here’s the catch — it’s approximate (uses HyperLogLog++).

{
  "aggs": {
    "unique_users": {
      "cardinality": {
        "field": "user_id",
        "precision_threshold": 3000
      }
    }
  }
}

In simple language — precision_threshold is the upper bound below which the count is essentially exact. Above it, error grows but stays small (~1-2% at most). Default is 3000, max is 40000. Higher precision = more memory.

Why approximate? Exact distinct counts require holding every unique value in memory across shards. That doesn’t scale. HyperLogLog uses a clever probabilistic structure that’s tiny in memory and “close enough” — typical error is well under 1% for the default settings.

Metric agg families at a glance

Family	Aggs	Use
Basic numeric	sum, avg, min, max	Sales, counts, ranges
Combined	stats, extended_stats	One-shot overview
Distribution	percentiles, percentile_ranks	Latency, SLOs
Unique counts	cardinality	DAU, unique IPs
Counting	value_count	Non-null counts

top_hits — sample docs from the bucket

Technically a metric agg, but very useful. Returns the top N docs from each context — often used as a sub-agg of a bucket to get a sample document per bucket.

{
  "aggs": {
    "by_category": {
      "terms": { "field": "category.keyword" },
      "aggs": {
        "highest_priced": {
          "top_hits": {
            "size": 1,
            "sort": [{ "price": "desc" }],
            "_source": ["title", "price"]
          }
        }
      }
    }
  }
}

Gets us “the most expensive product in each category”. Hugely useful for analytics dashboards.

Filtering and metrics

Just like bucket aggs, metric aggs run on the query result. Want “average order value for premium customers”?

GET /orders/_search
{
  "size": 0,
  "query": {
    "term": { "customer_tier": "premium" }
  },
  "aggs": {
    "avg_order": { "avg": { "field": "amount" } }
  }
}

Quick rules

One metric needed? Use the specific agg. Several? Use stats (one pass).
Latency/distribution reporting? Always percentiles, never just avg.
Distinct counts? cardinality — accept the approximation, it’s the price of scale.
Need a sample doc per bucket? top_hits sub-agg.
Metrics respect the outer query — combine query + metric for “X about my filtered data”.

References

Pipeline & Nested Aggregations

advanced elasticsearch aggregations pipeline nested

These are the two “advanced” aggregation topics that show up in senior interviews — pipeline aggs (computing on top of other aggs) and nested aggs (handling the nested field type).

Pipeline aggregations — “aggs of aggs”

In simple language — pipeline aggs don’t look at documents. They look at the output of other aggs and compute new values from it. Think SQL window functions.

Common pipeline aggs — avg_bucket, sum_bucket, max_bucket, min_bucket, derivative, moving_avg (deprecated → moving_fn), cumulative_sum, bucket_selector.

Example — daily revenue + moving average

GET /orders/_search
{
  "size": 0,
  "aggs": {
    "daily": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day"
      },
      "aggs": {
        "revenue": {
          "sum": { "field": "amount" }
        },
        "7d_moving_avg": {
          "moving_fn": {
            "buckets_path": "revenue",
            "window": 7,
            "script": "MovingFunctions.unweightedAvg(values)"
          }
        }
      }
    }
  }
}

Two nested levels here:

daily — date_histogram, one bucket per day.
Inside each day — revenue (a metric) + 7d_moving_avg (a pipeline agg that looks at the last 7 revenue values).

The magic is buckets_path: "revenue" — that’s how pipeline aggs reference other aggs.

Cumulative sum — running totals

{
  "aggs": {
    "daily": {
      "date_histogram": { "field": "created_at", "calendar_interval": "day" },
      "aggs": {
        "revenue": { "sum": { "field": "amount" } },
        "cumulative": {
          "cumulative_sum": { "buckets_path": "revenue" }
        }
      }
    }
  }
}

Each day’s cumulative bucket = sum of all previous days’ revenue. Classic for “total sales to date” charts.

Stats across buckets — avg_bucket, max_bucket

“What’s the average daily revenue?” — that’s avg_bucket:

{
  "aggs": {
    "daily": {
      "date_histogram": { "field": "created_at", "calendar_interval": "day" },
      "aggs": {
        "revenue": { "sum": { "field": "amount" } }
      }
    },
    "avg_daily_revenue": {
      "avg_bucket": { "buckets_path": "daily>revenue" }
    }
  }
}

Notice daily>revenue — the > is “into the sub-agg”. This is sibling position — avg_daily_revenue is a sibling of daily, not a child.

bucket_selector — filtering buckets

Drop buckets that don’t meet a condition.

{
  "aggs": {
    "by_category": {
      "terms": { "field": "category.keyword", "size": 50 },
      "aggs": {
        "revenue": { "sum": { "field": "amount" } },
        "min_revenue_filter": {
          "bucket_selector": {
            "buckets_path": { "rev": "revenue" },
            "script": "params.rev > 10000"
          }
        }
      }
    }
  }
}

This keeps only categories with > $10k revenue. The bucket_selector doesn’t compute a value — it just decides whether each bucket is kept or dropped.

Pipeline agg positions

PARENT (inside the bucket)

moving_fn, cumulative_sum, derivative

Works across consecutive buckets — needs ordered data (e.g., date_histogram)

SIBLING (next to the bucket)

avg_bucket, sum_bucket, max_bucket, min_bucket, stats_bucket

Computes one number from all buckets — no ordering needed

Nested aggregations — the nested field type

Different concept entirely. ES flattens arrays of objects by default, which causes a famous bug — fields in the same object lose their relationship.

Say we index:

{
  "name": "MacBook Pro",
  "reviews": [
    { "author": "alice", "rating": 5 },
    { "author": "bob",   "rating": 1 }
  ]
}

Without nested mapping, ES stores this as reviews.author: ["alice", "bob"] and reviews.rating: [5, 1]. A query like “author is alice AND rating is 1” would match this doc (because alice exists AND rating=1 exists), even though they’re from different objects.

The fix — declare reviews as nested in the mapping. Then each review object is indexed as a hidden child document, preserving relationships.

PUT /products
{
  "mappings": {
    "properties": {
      "reviews": {
        "type": "nested",
        "properties": {
          "author": { "type": "keyword" },
          "rating": { "type": "integer" }
        }
      }
    }
  }
}

Aggregating nested fields

To aggregate over nested objects, we need the nested aggregation as an intermediate step:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "reviews_agg": {
      "nested": { "path": "reviews" },
      "aggs": {
        "avg_rating": {
          "avg": { "field": "reviews.rating" }
        },
        "by_author": {
          "terms": { "field": "reviews.author" }
        }
      }
    }
  }
}

The nested agg “enters” the nested context. Now reviews.rating and reviews.author work correctly — relationships preserved.

reverse_nested — climbing back up

What if inside a nested agg we want to count parent docs, not child reviews?

{
  "aggs": {
    "reviews_agg": {
      "nested": { "path": "reviews" },
      "aggs": {
        "by_rating": {
          "terms": { "field": "reviews.rating" },
          "aggs": {
            "parent_products": {
              "reverse_nested": {}
            }
          }
        }
      }
    }
  }
}

For each rating bucket, reverse_nested tells us how many products (parent docs) contain a review at that rating — not how many review objects.

Quick rules

Pipeline aggs reference others via buckets_path — name for direct, name>sub for nested.
moving_fn, cumulative_sum, derivative are parent pipelines (live inside the bucket).
avg_bucket, max_bucket, etc. are sibling pipelines (next to the bucket).
bucket_selector filters buckets post-aggregation — like SQL HAVING.
For arrays of objects with field relationships that matter — use nested mapping + nested agg.
reverse_nested lets us count parent docs from inside a nested agg.

References

Sub-aggregations

intermediate elasticsearch aggregations sub-aggregations

This is where aggregations get really powerful. We can put aggs inside other aggs — bucket > metric, bucket > bucket > metric, and so on. The standard analytics pattern in ES is “split docs into buckets, then compute metrics per bucket.”

In simple language — every bucket agg can have an aggs block of its own. That inner block runs once per bucket.

The basic pattern — bucket + metric

“Average order value per category”:

GET /orders/_search
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": { "field": "category.keyword" },
      "aggs": {
        "avg_amount": {
          "avg": { "field": "amount" }
        }
      }
    }
  }
}

Response:

{
  "aggregations": {
    "by_category": {
      "buckets": [
        { "key": "laptops", "doc_count": 142, "avg_amount": { "value": 1240.50 } },
        { "key": "phones",  "doc_count": 98,  "avg_amount": { "value": 820.00  } },
        { "key": "tablets", "doc_count": 47,  "avg_amount": { "value": 540.75  } }
      ]
    }
  }
}

In SQL terms — SELECT category, AVG(amount) FROM orders GROUP BY category. The aggs inside is what makes the analogy work.

Multiple metrics per bucket

We can stack as many metrics inside a bucket as we want:

{
  "aggs": {
    "by_category": {
      "terms": { "field": "category.keyword" },
      "aggs": {
        "total_revenue": { "sum": { "field": "amount" } },
        "avg_order":     { "avg": { "field": "amount" } },
        "biggest_order": { "max": { "field": "amount" } },
        "order_count":   { "value_count": { "field": "amount" } }
      }
    }
  }
}

Each bucket now returns four numbers. Equivalent to SELECT category, SUM(amount), AVG(amount), MAX(amount), COUNT(amount) GROUP BY category.

Nesting buckets — bucket > bucket > metric

This is where it gets fun. “Daily revenue per category”:

GET /orders/_search
{
  "size": 0,
  "aggs": {
    "daily": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day"
      },
      "aggs": {
        "by_category": {
          "terms": { "field": "category.keyword" },
          "aggs": {
            "revenue": { "sum": { "field": "amount" } }
          }
        }
      }
    }
  }
}

Now we get a tree — for each day, for each category, the revenue. Perfect for a stacked-bar chart.

Aggregation tree

daily (date_histogram)

↓ one bucket per day

2026-05-24

2026-05-25

2026-05-26

↓ each day → terms agg

laptops

phones

tablets

↓ each category → metric

sum(amount) → $42,150

Ordering buckets by sub-agg values

By default, terms orders buckets by doc_count desc. We can order by a sub-metric instead — “top 5 categories by total revenue”:

{
  "aggs": {
    "by_category": {
      "terms": {
        "field": "category.keyword",
        "size": 5,
        "order": { "revenue": "desc" }
      },
      "aggs": {
        "revenue": { "sum": { "field": "amount" } }
      }
    }
  }
}

For ordering by stats sub-aggs, use dot notation — { "stats_agg.avg": "desc" }.

A realistic dashboard query

Putting it all together — “monthly active users by country, last 90 days”:

GET /events/_search
{
  "size": 0,
  "query": {
    "range": { "@timestamp": { "gte": "now-90d/d" } }
  },
  "aggs": {
    "monthly": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "month"
      },
      "aggs": {
        "by_country": {
          "terms": { "field": "country.keyword", "size": 10 },
          "aggs": {
            "unique_users": {
              "cardinality": { "field": "user_id" }
            },
            "top_actions": {
              "terms": { "field": "action.keyword", "size": 3 }
            }
          }
        }
      }
    }
  }
}

This single request gives us — for each of the last 3 months, for each of the top 10 countries, the unique user count AND the top 3 actions. A whole dashboard panel in one query.

Performance considerations

Sub-aggregations multiply work. A terms agg with 50 buckets, each with another terms with 50 sub-buckets, means 2,500 buckets in memory. ES applies safety limits (search.max_buckets — default 65,536). Hit it, get an error.

Tips:

Keep size reasonable. Don’t ask for 10,000 sub-buckets unless you really need them.
For high-cardinality fields, consider sampling with the sampler agg.
Cardinality sub-aggs are cheap (HyperLogLog). Terms sub-aggs are expensive at scale.
Prefer filter-context queries above the agg block — fewer docs flow into aggs.

Quick rules

Bucket > metric — the basic split-apply pattern.
Bucket > bucket > metric — the typical analytics dashboard pattern.
order: { sub_agg: "desc" } — sort buckets by a sub-metric.
Mind search.max_buckets for deeply nested aggs.
Filter the result set with a query first, then aggregate — way more efficient than aggregating then post-filtering.

References

Search Features

Relevance & Scoring (TF-IDF, BM25)

advanced elasticsearch scoring bm25 relevance

When we search for “fast laptop”, Elasticsearch returns matching docs sorted by a _score. That score is a number telling us how relevant the doc is. The higher the score, the better the match.

In simple language: scoring is “how strongly does this document match my query, given the words it contains and how common those words are across the whole index”.

TF-IDF (the old way)

Before ES 5, the default scoring used TF-IDF:

TF (Term Frequency) — the more times a term appears in a doc, the higher the score.
IDF (Inverse Document Frequency) — rare terms across the index count more. “laptop” matters more than “the”.
Field length norm — shorter fields score higher (a match in a title beats a match in a long description).

The problem: TF grows unbounded. A doc that mentions “laptop” 100 times scores way higher than one that mentions it 5 times — even though both are obviously about laptops.

BM25 (the current default, since ES 5.0)

BM25 stands for “Best Matching 25”. Think of it like TF-IDF with two important fixes:

TF saturation — repeating a term gives diminishing returns. After 5–10 occurrences, more mentions barely move the needle.
Length normalization is tunable — controlled by a parameter b.

BM25 formula (simplified)

score = IDF(term) × (tf × (k1 + 1)) / (tf + k1 × (1 - b + b × (dl / avgdl)))

k1 (default 1.2) — controls TF saturation. Higher = TF matters more.
b (default 0.75) — controls length norm. 0 = ignore length, 1 = full normalization.
dl = doc length, avgdl = average doc length in the index.

Tuning BM25 per field

We can override k1 and b on a per-index basis:

PUT /products
{
  "settings": {
    "index": {
      "similarity": {
        "custom_bm25": {
          "type": "BM25",
          "k1": 1.5,
          "b": 0.5
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "similarity": "custom_bm25"
      }
    }
  }
}

Debugging scores with `explain`

When relevance feels off, use explain to see why a doc scored what it did:

GET /products/_search
{
  "explain": true,
  "query": { "match": { "description": "fast laptop" } }
}

The response includes a breakdown: IDF value, TF value, field length, and the final BM25 product for each term.

When BM25 isn’t enough

BM25 only looks at lexical matches — it has no idea “laptop” and “notebook” mean the same thing. For semantic similarity, we layer on:

Synonyms at analyzer time (cheap, fast)
Function score / script_score to boost recent or popular docs
Dense vector search (kNN) for true semantic matching

For most CRUD-y search interview questions though, “BM25 with TF saturation and length normalization” is the right answer.

References

Pagination: from/size vs scroll vs search_after

intermediate elasticsearch pagination search_after scroll

Pagination in Elasticsearch is the classic interview gotcha. “How would you paginate to page 1000?” — the wrong answer is from: 9990, size: 10. Here’s why, and what to use instead.

from/size (the obvious one)

Looks like SQL LIMIT/OFFSET:

GET /products/_search
{
  "from": 20,
  "size": 10,
  "query": { "match_all": {} }
}

The problem: every shard has to fetch from + size docs, sort them locally, send them to the coordinating node, which then re-sorts all of them and throws away from rows. With 5 shards and from: 9990, every shard ships 10,000 docs over the wire just to return 10. That’s a memory and network disaster.

Elasticsearch enforces a hard ceiling: from + size <= 10000 by default (index.max_result_window).

The 10,000 window wall

from/size
good up to ~10k results, jumpable pages, stateless

scroll
snapshot of data, batch export, no live updates

search_after
live deep pagination, recommended for users

Scroll (deprecated for user-facing pagination)

Scroll grabs a snapshot of the index and lets us page through it without re-running the query.

POST /products/_search?scroll=1m
{
  "size": 100,
  "query": { "match_all": {} }
}

The response includes a _scroll_id. Hand it back to keep paging:

POST /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5..."
}

Use scroll for batch jobs: reindexing, exports, ML training data. Don’t use it for users — the snapshot doesn’t reflect new docs added after the scroll started, and it holds resources on the cluster for the scroll’s lifetime.

search_after (the right answer for deep pagination)

search_after says: “give me the next page after this sort value”. No offset arithmetic, no window limit.

GET /products/_search
{
  "size": 10,
  "query": { "match_all": {} },
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ]
}

The last doc in the response has a sort array — say [1700000000, "p_42"]. Pass that to the next request:

GET /products/_search
{
  "size": 10,
  "query": { "match_all": {} },
  "search_after": [1700000000, "p_42"],
  "sort": [
    { "created_at": "desc" },
    { "_id": "asc" }
  ]
}

Each shard now does a cheap “give me docs sorted after this point” — no offset to skip. Always include a tiebreaker field (like _id) in the sort so pagination is deterministic when two docs share the primary sort value.

Point in Time (PIT) — search_after’s best friend

Plain search_after sees new docs as they arrive, which can cause duplicates. To freeze the view, open a PIT:

POST /products/_pit?keep_alive=1m
# { "id": "46ToAwMDaWR5..." }

Then pass pit.id instead of the index name in search requests. Combined with search_after, this is the modern, scalable way to deep-paginate.

TL;DR

Users browsing pages 1–10? → from/size.
Batch export of millions of docs? → scroll (or PIT + search_after).
Live deep pagination, infinite scroll? → search_after + PIT.
Never raise max_result_window to “fix” deep pagination. That’s treating the symptom.

References

Highlighting & Suggesters

intermediate elasticsearch highlighting suggesters autocomplete

Two features that turn raw search into a real product UX: highlighting (showing users why a result matched) and suggesters (autocomplete and typo correction).

Highlighting — “show me what matched”

When we Google something, the matched words are bolded in the snippet. That’s highlighting. Elasticsearch wraps the matched terms in <em> tags (configurable) inside a copy of the field.

GET /articles/_search
{
  "query": {
    "match": { "body": "elasticsearch sharding" }
  },
  "highlight": {
    "fields": {
      "body": {}
    }
  }
}

The response now includes a highlight block per hit:

"highlight": {
  "body": [
    "An intro to <em>elasticsearch</em> <em>sharding</em> and routing..."
  ]
}

Three highlighter types

Think of it like JPEG vs PNG vs SVG — same goal, different tradeoffs.

unified (default) — uses Lucene’s UnifiedHighlighter. Works on any text field, decent speed, supports complex queries. Good default.
plain — re-runs the query on each field in memory. Most accurate, slowest, OK for small fields like titles.
fvh (fast vector highlighter) — needs term_vector: with_positions_offsets in the mapping. Fastest for long fields (think entire article body).

"highlight": {
  "pre_tags": ["<mark>"],
  "post_tags": ["</mark>"],
  "fields": {
    "title": { "type": "plain" },
    "body":  { "type": "fvh", "fragment_size": 150, "number_of_fragments": 3 }
  }
}

fragment_size limits each snippet length; number_of_fragments caps how many we get.

Suggesters — autocomplete and “did you mean?”

Suggesters are a separate API path designed for low-latency single-keystroke responses.

Pick your suggester

term
single-word typo fix. "elasitcsearch" → "elasticsearch".

phrase
whole-phrase correction. "the qick brown fox" → "the quick brown fox".

completion
prefix-as-you-type autocomplete. FST-backed, sub-ms.

Completion suggester (the autocomplete one)

You need a dedicated field of type completion:

PUT /products
{
  "mappings": {
    "properties": {
      "name_suggest": { "type": "completion" }
    }
  }
}

Index docs with input options (and optional weight to bias popular items):

POST /products/_doc
{
  "name_suggest": {
    "input": ["Macbook Pro", "MBP"],
    "weight": 100
  }
}

Query — note this is _search with a top-level suggest block, no query:

POST /products/_search
{
  "suggest": {
    "p-suggest": {
      "prefix": "macb",
      "completion": {
        "field": "name_suggest",
        "size": 5,
        "fuzzy": { "fuzziness": 1 }
      }
    }
  }
}

Behind the scenes the completion suggester stores entries in an FST (finite-state transducer) held entirely in memory. That’s why it’s blisteringly fast but uses RAM — don’t dump millions of entries in without budgeting.

Term suggester (the “did you mean?” one)

POST /articles/_search
{
  "suggest": {
    "did_you_mean": {
      "text": "elasitcsearch sharing",
      "term": { "field": "body" }
    }
  }
}

Term suggester walks the index’s term dictionary using edit distance (Levenshtein) to find close matches. Use it when a regular query returns zero results.

When to use what

Showing matched snippets in search results? → highlighting (unified by default).
Typeahead in a search box? → completion suggester.
“Did you mean…?” prompt? → phrase or term suggester.
Need both autocomplete and full-text search on the same field? → index it twice: once as text, once as completion.

References

Performance, Scaling & Ops

Sharding Strategy & Routing

advanced elasticsearch sharding routing scaling

A shard is a Lucene index — a self-contained chunk of our data. We split an Elasticsearch index into N primary shards so it can scale horizontally across nodes. Routing is how Elasticsearch decides which shard a given doc lives on.

The routing formula

In simple language: when we index a doc, Elasticsearch hashes its ID and modulos by the number of primary shards.

shard_num = hash(_routing) % num_primary_shards

By default _routing is the document _id. Override it to put related docs on the same shard.

Doc → shard mapping (5 primary shards)

doc_1
↓ hash=8472
shard 2

doc_2
↓ hash=1234
shard 4

doc_3
↓ hash=9999
shard 4

doc_4
↓ hash=5555
shard 0

Why `num_primary_shards` is (almost) forever

Notice the formula has num_primary_shards in the denominator. If we change it, every doc’s target shard changes — meaning every doc would have to be moved. That’s why you can’t change primary shard count on a live index. You have to reindex to a new index with the new shard count.

The only escape hatches:

Split API — increases primary shards by a multiple (e.g. 2 → 4, 3 → 9). Requires the index to be read-only first.
Shrink API — decreases primary shards down to a factor. Index must be read-only and all primaries on one node.

Both are expensive operations.

Sizing shards — the rules of thumb

20–50 GB per shard is the sweet spot for search-heavy workloads.
Aim for < 200M docs per shard (Lucene’s hard limit is around 2 billion, but performance drops well before).
Heap memory per node should be > 1 GB per ~20 shards the node hosts.
Don’t over-shard. Each shard has fixed overhead (file handles, memory, refresh cost). 1000 tiny shards is worse than 10 healthy ones.

Settings example

PUT /orders
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

number_of_replicas is changeable any time. Each replica is a full copy of a primary shard for HA and read throughput.

By default, a search on a 5-shard index hits all 5 shards and the coordinating node merges results. If we know all the docs we care about live on one shard, we can hit just that shard.

Multi-tenant SaaS app, where every query filters by tenant_id:

# Index with custom routing
POST /tickets/_doc?routing=tenant_42
{
  "tenant_id": "tenant_42",
  "subject": "Login bug"
}

# Search the single relevant shard
GET /tickets/_search?routing=tenant_42
{
  "query": { "term": { "tenant_id": "tenant_42" } }
}

Massive throughput win — instead of 5 shards each doing 1/5 of the work, 1 shard does 1/5 of the work and the others stay idle for other tenants’ queries.

The downside: hot shards

If tenant_42 is a huge customer, that one shard gets all their data and all their queries. We get a hot shard — uneven storage, uneven CPU, uneven latency. Mitigations:

Use a composite routing key like tenant_id + "_" + region.
Use the routing_partition_size setting to spread one tenant across multiple shards.

When to over-shard vs under-shard

Time-series logs (write-heavy, retention): use index-per-day or data streams, with maybe 1 shard per index.
Reference data (small, hot, read-heavy): 1 primary, many replicas.
Big general search index: aim for 30–50 GB shards. Plan growth — if you’ll hit 1 TB, start with 20+ primaries.

The interview-quality answer: “primary shard count is fixed at creation, picked based on expected growth, target shard size, and routing patterns; replicas are tunable for read throughput and HA”.

References

Refresh, Flush & Near-Real-Time Search

advanced elasticsearch refresh flush translog lucene

When we index a doc, it is not immediately searchable. Elasticsearch is “near real-time” — there’s a 1-second lag by default. To understand why, we need to walk through how a write actually lands on disk.

The journey of a single doc

In simple language: every indexed doc takes three steps before it’s safe and searchable.

Write path timeline

1. Index request arrives
Doc written to in-memory buffer + appended to translog on disk (fsync per request by default).

↓ every 1s (refresh)

2. Refresh
Buffer is written to a new Lucene segment in OS filesystem cache. Now searchable.

↓ every 30 min or 512MB translog

3. Flush
fsync segments to durable disk, clear translog. Now crash-safe even without translog replay.

Refresh — what makes a doc searchable

A refresh turns the in-memory buffer into a Lucene segment. Default interval: 1 second per shard.

PUT /products/_settings
{
  "index.refresh_interval": "30s"
}

The catch: refreshes are expensive. They flush a new segment file, and lots of tiny segments slow down search (more segments = more lookups per query). For write-heavy workloads (logs, metrics), bumping refresh to 30s or -1 (disabled) is a huge throughput win.

You can also force a refresh on demand — useful in tests:

POST /products/_refresh

Or per-request:

POST /products/_doc?refresh=wait_for
{ "name": "Macbook" }

refresh=wait_for waits for the next scheduled refresh. refresh=true forces one immediately (slow, don’t use in hot paths).

Translog — the safety net

Refresh writes to OS cache, not durable disk. If the node crashes between refreshes, the segment is gone. That’s where the translog (transaction log) saves us.

Every write is appended to the translog before acking the client. On crash recovery, Elasticsearch replays the translog to rebuild the in-memory state.

Default durability:

"index.translog.durability": "request",   // fsync on every request (safe, default)
"index.translog.sync_interval": "5s",     // alt: fsync every 5s if "async"
"index.translog.flush_threshold_size": "512mb"

Setting durability: async means we batch fsyncs every 5s — faster writes but we can lose up to 5s of acked data on crash. Use only for logs/metrics where loss is OK.

Flush — making it truly durable

A flush does three things:

fsync all current segments to disk
Write a new Lucene commit point
Truncate the translog (we no longer need it for those docs)

Flushes happen automatically when the translog hits 512 MB, or every 30 minutes. You rarely call flush manually — but you might before an upgrade or snapshot:

POST /products/_flush

Segment merges — the cleanup crew

Every refresh creates a new tiny segment. Tiny segments are bad for search. Elasticsearch (via Lucene) merges small segments into larger ones in the background. Merges also physically delete docs marked as deleted (an update is a delete + insert — the old version sticks around until merge).

Tuning merges is rarely worth it; the default scheduler is good. The thing to remember: merges cost CPU and IO. If your nodes are under-provisioned, merges will steal resources from indexing and search.

Putting it together: write-heavy tuning

For a logging pipeline doing 100k docs/sec:

PUT /logs-2026-05/_settings
{
  "index.refresh_interval": "30s",
  "index.translog.durability": "async",
  "index.translog.sync_interval": "10s",
  "index.number_of_replicas": 0
}

After bulk load is done, flip back:

PUT /logs-2026-05/_settings
{
  "index.refresh_interval": "1s",
  "index.number_of_replicas": 1
}

This is a classic interview answer: “I’d disable refresh and replicas during bulk load, then re-enable them. Refresh interval and translog durability are the two big knobs for write throughput vs search freshness vs durability.”

References

Bulk API & Reindexing

intermediate elasticsearch bulk reindex aliases

Indexing one doc at a time is painfully slow — every request pays network and refresh overhead. The Bulk API lets us batch up writes for serious throughput. Reindexing (via aliases) lets us change mappings on a live index without downtime.

Bulk API — batching writes

The bulk endpoint takes a stream of action/document pairs separated by newlines. Each pair is a single op (index/create/update/delete).

POST /_bulk
{ "index": { "_index": "products", "_id": "1" } }
{ "name": "Macbook Pro", "price": 1999 }
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "iPad", "price": 599 }
{ "delete": { "_index": "products", "_id": "old-99" } }

Note: each line ends with a newline, including the last one. This is NDJSON, not regular JSON. Lots of bugs come from sending a single big array instead.

How big should a batch be?

Rule of thumb: 5–15 MB per request, or roughly 1000–5000 docs depending on doc size. Bigger isn’t always better — over ~100 MB and we risk OOM on the coordinating node.

Practical tuning:

Start at 1000 docs/batch
Run, watch indexing rate
Double the batch size until throughput stops improving or you see rejections
Pick the largest batch size before plateau

Parallel bulk indexers

A single thread can’t saturate the cluster. Use 4–16 parallel bulk threads (test on your hardware). The Elasticsearch Python client and Java client both have helpers (parallel_bulk, BulkProcessor) for this.

Handling partial failures

A bulk request can succeed overall but have individual failures. Always check response.errors:

{
  "took": 30,
  "errors": true,
  "items": [
    { "index": { "_id": "1", "status": 201 } },
    { "index": { "_id": "2", "status": 400, "error": { ... } } }
  ]
}

Retry failed items with exponential backoff, especially on 429 Too Many Requests (the rejected-execution exception).

Reindexing — when mappings need to change

Most field mappings in Elasticsearch are immutable. Want to change text to keyword? Want a different analyzer? Want fewer primary shards? You must reindex into a new index.

The Reindex API

POST /_reindex
{
  "source": { "index": "products_v1" },
  "dest":   { "index": "products_v2" }
}

Optionally transform docs inline:

POST /_reindex
{
  "source": { "index": "products_v1" },
  "dest":   { "index": "products_v2" },
  "script": {
    "source": "ctx._source.price_cents = (int)(ctx._source.price * 100); ctx._source.remove('price')"
  }
}

Run it async with ?wait_for_completion=false for big reindexes — it returns a task ID you can monitor with GET /_tasks/{task_id}.

Zero-downtime reindex via aliases

This is the killer pattern. We never expose raw index names to the app — we expose an alias.

Zero-downtime reindex flow

1. App reads/writes via alias products → pointing to products_v1.

2. Create products_v2 with new mapping/settings.

3. Reindex v1 → v2 (live). New writes still hit v1.

4. Dual-write or replay the delta since reindex started.

5. Atomically swap alias: remove from v1, add to v2.

6. Verify, then drop products_v1.

The atomic alias swap:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "products_v1", "alias": "products" } },
    { "add":    { "index": "products_v2", "alias": "products" } }
  ]
}

Both actions happen in one cluster state update — there’s no moment where the alias points to nothing.

Tips for fast reindexing

Set number_of_replicas: 0 on the destination during the copy. Add replicas after.
Set refresh_interval: -1 on destination during copy.
Use slices: "auto" in the reindex request to parallelize across source shards.
Reindex from a snapshot (_reindex source can be a remote cluster) if you’re upgrading major versions.

The interview-quality answer: “always front your indices with an alias from day one, so you can reindex without touching app code”.

References

Cluster Health & Snapshots

intermediate elasticsearch cluster-health snapshots backup

Two ops-y topics that come up in basically every Elasticsearch interview: “what does cluster status mean?” and “how do you back up an Elasticsearch cluster?”. Both are simpler than they sound.

Cluster health: green, yellow, red

GET /_cluster/health returns a status. In simple language:

GREEN
All primaries and all replicas are assigned. Cluster is happy.

YELLOW
All primaries assigned, but some replicas are not. Reads/writes still work; HA is degraded.

RED
At least one primary shard is unassigned. That data is unavailable. Action required.

Common causes by color

Yellow on a single-node dev cluster is normal — replicas can’t be assigned to the same node as the primary, so they stay unassigned.
Yellow on a multi-node cluster after a node leaves — replicas need to be re-created on remaining nodes. Wait, or force-allocate.
Red after a disk fills up — primaries get unassigned. Free disk or expand storage.
Red after a corrupted shard — restore from snapshot.

Drilling deeper

GET /_cluster/health?level=indices
GET /_cluster/allocation/explain

allocation/explain is the single best command for “why is this shard not where I want it?”. It tells you exactly which allocation decider blocked the assignment (disk watermark, awareness rules, filtering, max_shards_per_node, etc.).

Disk watermarks (a frequent red-cluster culprit)

Low watermark (default 85%) — Elasticsearch stops allocating new shards to that node.
High watermark (default 90%) — Elasticsearch tries to move shards off the node.
Flood stage (default 95%) — indices on that node become read-only. This often turns a yellow cluster red.

When this happens, free disk and then lift the read-only block:

PUT /_all/_settings
{ "index.blocks.read_only_allow_delete": null }

Snapshots — the only real backup mechanism

You can’t just tar an Elasticsearch data directory while it’s running. The proper way to back up is snapshots to a registered repository.

Step 1: register a repository

A repository is a storage backend — S3, GCS, Azure Blob, or a shared filesystem.

PUT /_snapshot/my_s3_repo
{
  "type": "s3",
  "settings": {
    "bucket": "es-backups-prod",
    "region": "us-east-1",
    "base_path": "cluster-a"
  }
}

The S3 plugin needs to be installed on every node (it ships built-in since 8.x).

Step 2: take a snapshot

PUT /_snapshot/my_s3_repo/snap_2026_05_26?wait_for_completion=false

By default this snapshots all indices. Limit it:

PUT /_snapshot/my_s3_repo/snap_products_only
{
  "indices": "products,orders-*",
  "ignore_unavailable": true,
  "include_global_state": false
}

Snapshots are incremental at the segment level. If a Lucene segment hasn’t changed since the last snapshot, it’s not re-uploaded. The first snapshot is expensive; subsequent ones are cheap.

Step 3: restore

POST /_snapshot/my_s3_repo/snap_2026_05_26/_restore
{
  "indices": "products",
  "rename_pattern": "products",
  "rename_replacement": "products_restored"
}

The rename lets us restore alongside a live index without conflict — handy for “compare prod vs yesterday”.

Snapshot Lifecycle Management (SLM)

Don’t write cron jobs to call the snapshot API. Use SLM, which is built in:

PUT /_slm/policy/daily-snapshots
{
  "schedule": "0 30 1 * * ?",
  "name": "<daily-snap-{now/d}>",
  "repository": "my_s3_repo",
  "config": { "indices": ["*"] },
  "retention": {
    "expire_after": "30d",
    "min_count": 5,
    "max_count": 50
  }
}

SLM handles scheduling, retention, and cleanup. Set it up once and forget.

TL;DR

Green = healthy, Yellow = HA degraded, Red = data unavailable.
cluster/allocation/explain is your best debugging tool.
Disk watermarks at 85/90/95% cause most prod incidents.
Snapshots → S3 (or similar) repository → SLM for scheduling. Incremental, cheap, restorable.

References

Common Pitfalls

advanced elasticsearch pitfalls production

A loaded interview question: “What’s gone wrong on an Elasticsearch cluster you’ve worked with?” Here are the most common production landmines, what causes them, and how to dodge each.

Pitfall → Consequence → Fix

Pitfall	Consequence	Fix
Mapping explosion	OOM, slow cluster state	Disable dynamic mapping; use `flattened`
Deep pagination	10k window, OOM coordinator	Use `search_after` + PIT
Hot shards	Uneven CPU, slow tail latency	Better routing keys, more shards
Oversized docs	Slow indexing, GC churn	Strip blobs, split into child docs
Refresh misuse	Throughput collapse	Raise `refresh_interval`, never `refresh=true`

1. Mapping explosion

Dynamic mapping auto-creates fields when it sees new JSON keys. Index 100k docs where each has a unique key (think event_id_abc123: { ... }) and you’ve got 100k fields. Each field uses memory in the cluster state, which is replicated to every node on every change.

// Bad: a free-form JSON blob with dynamic mapping
{ "properties": { "user_attributes": { "type": "object" } } }

Fixes:

Disable dynamic mapping with "dynamic": "strict" so unexpected fields throw an error instead of being added.
For genuinely open-ended data, use the flattened field type — it treats the whole object as a single field.

{ "properties": { "user_attributes": { "type": "flattened" } } }

Hard limit: index.mapping.total_fields.limit defaults to 1000. Hitting it means you’ve already lost.

2. Deep pagination

Covered in detail in the pagination note. Short version: from + size > 10000 is forbidden by default, and even before that, deep pagination ships massive amounts of data across the network. Use search_after with a PIT. Don’t raise max_result_window as a “fix”.

3. Hot shards

Custom routing or natural data skew (one tenant = 80% of traffic) leads to one shard being a CPU bottleneck while siblings idle. Symptoms: high p99 latency, uneven hot_threads output across nodes.

Diagnose:

GET /_cat/shards?v&s=store:desc
GET /_nodes/hot_threads

Fixes:

Compound the routing key (tenant_id + "_" + shard_bucket)
Use routing_partition_size to spread one tenant across multiple shards
For time-series, switch to data streams with rollover so writes always go to the newest index

4. Oversized documents

Stuffing a 5 MB PDF as base64 into a doc field is asking for pain. The doc gets parsed on every refresh, fielddata blows up, network transfer is slow.

Rules of thumb:

Aim for < 100 KB per doc for search workloads
Strip binaries before indexing — put them in S3 and store the URL
For genuinely nested arrays (think comments: [...] with thousands of entries), consider parent-child or splitting into separate docs

The http.max_content_length setting caps request size at 100 MB, but you should never approach that.

5. Refresh interval misuse

POST /_doc?refresh=true forces a refresh on every single write. We see this in tests and it leaks into prod code. Each refresh creates a new tiny Lucene segment, kicks off merges, and tanks throughput.

Symptoms: writes work fine in dev (low volume), crawl in prod (high volume).

# Wrong (production)
POST /events/_doc?refresh=true
{ ... }

# Right
POST /events/_doc
{ ... }
# Trust the 1-second default, or use refresh=wait_for if you must

For bulk-load jobs, go further:

PUT /events/_settings
{ "index.refresh_interval": "-1", "index.number_of_replicas": 0 }

Then restore after the load.

6. Bonus: replica = 0 in production

A single-replica setup means losing one node = losing data. We’ve seen teams disable replicas “for performance” and forget to re-enable them. Always run with number_of_replicas >= 1 in production. Use replicas for HA, not just read throughput.

7. Bonus: searching across hundreds of indices

GET /logs-*/_search looks innocent but can fan out to thousands of shards on a cluster with daily indices. Each shard adds coordinator overhead. Mitigations:

Use _search with a tight date range and index name patterns that prune time
Consider rollover + ILM (Index Lifecycle Management) to consolidate old data
Use pre_filter_shard_size so the coordinator skips shards that can’t match

The meta-lesson: most Elasticsearch pitfalls aren’t bugs in Elasticsearch — they’re defaults that work great at small scale and bite at large scale. Know which knobs change with traffic, and turn them before the page.

Fundamentals

Why not just use Postgres?

When to reach for it

When NOT to use it

The mental model

References

Why does this matter?

Let’s build one by hand

What gets stored besides doc IDs

Why “keyword” fields skip this

Trade-off

References

Document

Index

Node

Cluster

The hierarchy in one line

References

Why shard?

Primary vs replica

How a document ends up on a specific shard

Writes vs reads

Setting shards & replicas

Sizing rules of thumb

TL;DR

References

The metadata fields (prefixed with _)

_id — the document ID

_source — the field that matters most

_version and concurrency control

_type — the ghost of versions past

Putting it together

References

Indexing & Mapping

The two halves of index config

A real index creation request

number_of_shards

number_of_replicas

refresh_interval

analysis

What you CAN change later

A common pattern: close → update → reopen

Quick checks

References

Dynamic mapping — the prototype mode

Explicit mapping — the production mode

Adding new fields later

Controlling dynamic behavior

Dynamic templates — the middle ground

The TL;DR

References

text vs keyword — THE question

The multi-field pattern (use this)

Numeric types

Date

Boolean

IP

Object — implicit nesting

Object vs nested — the array trap

Quick reference

References

The pipeline: three stages

1. Character filters

2. Tokenizer (exactly one)

3. Token filters (any number, in order)

Built-in analyzers (don’t reinvent the wheel)

Testing analyzers with _analyze

Defining a custom analyzer

Index-time vs search-time analyzers

The golden rule

References

Index Templates — config that auto-applies

Composable templates

Aliases — the redirect layer

The blue-green reindex pattern

Aliases for log rotation

Filtered aliases

Data streams — the modern way for logs

TL;DR

References

The metadata fields (prefixed with `_`)

`_id` — the document ID

`_source` — the field that matters most

`_version` and concurrency control

`_type` — the ghost of versions past

`number_of_shards`

`number_of_replicas`

`refresh_interval`

`analysis`

Testing analyzers with `_analyze`