Elasticsearch

Search, indexing, query DSL, aggregations, and scaling concepts for Elasticsearch interviews.

5 Fundamentals

What is Elasticsearch Inverted Index Cluster, Node, Index, Document Shards & Replicas Document Structure

5 Indexing & Mapping

Index Creation & Settings Mapping (dynamic vs explicit) Field Data Types Analyzers & Tokenizers Index Templates & Aliases

6 Query DSL

Match vs Term Query Bool Query Range, Exists, Wildcard, Prefix, Regex Fuzzy & Multi-match Compound Queries & Function Score Full-text vs Term-level

4 Aggregations

Bucket Aggregations Metric Aggregations Pipeline & Nested Aggs Sub-aggregations

3 Search Features

Relevance & Scoring (BM25) Pagination (from/size, scroll, search_after) Highlighting & Suggesters

5 Performance, Scaling & Ops

Sharding Strategy & Routing Refresh, Flush & NRT Search Bulk API & Reindexing Cluster Health & Snapshots Common Pitfalls

Fundamentals

What is Elasticsearch & When to use it

Elasticsearch is a distributed search and analytics engine built for speed on text and aggregations.

beginner elasticsearch basics search

Inverted Index

The core data structure that makes Elasticsearch fast — a map from each term to the documents containing it.

intermediate elasticsearch internals lucene

Cluster, Node, Index, Document

The four nouns you need to talk about Elasticsearch — from a single JSON doc up to a whole cluster.

beginner elasticsearch basics architecture

Shards & Replicas

How Elasticsearch splits an index across machines and keeps copies for fault tolerance.

intermediate elasticsearch shards distributed

Document Structure

What's inside a returned Elasticsearch document — _id, _source, _index, and the metadata fields you'll see in every response.

beginner elasticsearch documents metadata

Indexing & Mapping

Index Creation & Settings

How to create an index with the right shards, replicas, analyzers, and mappings from day one.

intermediate elasticsearch indexing settings

Mapping: Dynamic vs Explicit

Letting ES guess your schema vs declaring it upfront — the difference between a prototype and a production index.

intermediate elasticsearch mapping schema

Field Data Types

text vs keyword (the most common ES interview question), plus numeric, date, object, nested, ip, and friends.

intermediate elasticsearch mapping types

Analyzers, Tokenizers & Token Filters

How raw text becomes searchable tokens — character filters, tokenizer, token filters.

intermediate elasticsearch analyzers text

Index Templates & Aliases

Two production patterns: templates for auto-applying settings to new indices, aliases for zero-downtime reindexing and log rotation.

intermediate elasticsearch templates aliases

Query DSL

Match vs Term Query

The classic ES interview question — when does Elasticsearch analyze your search input, and when does it look for an exact byte-for-byte match?

intermediate elasticsearch query-dsl match

Bool Query

The Swiss army knife of Elasticsearch — combining must, should, must_not, and filter clauses. And why filter is way faster than must.

intermediate elasticsearch query-dsl bool

Range, Exists, Wildcard, Prefix & Regex Queries

The utility belt of Query DSL — querying numeric/date ranges, checking field existence, and doing pattern matching on keyword fields.

intermediate elasticsearch query-dsl range

Fuzzy & Multi-match Queries

Handling typos with Levenshtein distance, and searching across multiple fields in a single query.

intermediate elasticsearch query-dsl fuzzy

Compound Queries & Function Score

When default BM25 isn't enough — boosting, decay functions, and writing custom scoring logic on top of search results.

advanced elasticsearch query-dsl function-score

Full-text vs Term-level Queries — When to use which

The mental model for choosing between analyzed full-text queries and exact term-level queries. One of the most common mistakes in ES.

intermediate elasticsearch query-dsl full-text

Aggregations

Bucket Aggregations

Grouping documents into buckets — like SQL's GROUP BY but more flexible. terms, date_histogram, range, and filters aggregations.

intermediate elasticsearch aggregations buckets

Metric Aggregations

Computing numbers across docs — avg, sum, min, max, stats, percentiles, cardinality. The SUM and COUNT of Elasticsearch.

intermediate elasticsearch aggregations metrics

Pipeline & Nested Aggregations

Aggregations that operate on other aggregations — moving averages, derivatives, bucket selectors. Plus aggs on nested fields.

advanced elasticsearch aggregations pipeline

Sub-aggregations

The killer feature of ES aggregations — nesting metrics inside buckets, and buckets inside buckets. The standard analytics pattern.

intermediate elasticsearch aggregations sub-aggregations

Search Features

Relevance & Scoring (TF-IDF, BM25)

How Elasticsearch computes _score and why BM25 replaced TF-IDF as the default.

advanced elasticsearch scoring bm25

Pagination: from/size vs scroll vs search_after

Why deep pagination kills clusters and how to do it right.

intermediate elasticsearch pagination search_after

Highlighting & Suggesters

Bolding matched terms in results and powering autocomplete/typeahead.

intermediate elasticsearch highlighting suggesters

Performance, Scaling & Ops

Sharding Strategy & Routing

How docs map to shards, why num_primary_shards is forever, and how to use custom routing.

advanced elasticsearch sharding routing

Refresh, Flush & Near-Real-Time Search

Why ES is near-real-time, not real-time — the journey from in-memory buffer to durable disk.

advanced elasticsearch refresh flush

Bulk API & Reindexing

High-throughput indexing and zero-downtime reindexing using aliases.

intermediate elasticsearch bulk reindex

Cluster Health & Snapshots

Green/yellow/red cluster states and backup/restore with snapshot repositories.

intermediate elasticsearch cluster-health snapshots

Common Pitfalls

Mapping explosion, deep pagination, hot shards, oversized docs, refresh misuse — and how to avoid them.

advanced elasticsearch pitfalls production