Sharding Strategy & Routing

advanced elasticsearch sharding routing scaling

A shard is a Lucene index — a self-contained chunk of our data. We split an Elasticsearch index into N primary shards so it can scale horizontally across nodes. Routing is how Elasticsearch decides which shard a given doc lives on.

The routing formula

In simple language: when we index a doc, Elasticsearch hashes its ID and modulos by the number of primary shards.

shard_num = hash(_routing) % num_primary_shards

By default _routing is the document _id. Override it to put related docs on the same shard.

Doc → shard mapping (5 primary shards)
doc_1
↓ hash=8472
shard 2
doc_2
↓ hash=1234
shard 4
doc_3
↓ hash=9999
shard 4
doc_4
↓ hash=5555
shard 0

Why num_primary_shards is (almost) forever

Notice the formula has num_primary_shards in the denominator. If we change it, every doc’s target shard changes — meaning every doc would have to be moved. That’s why you can’t change primary shard count on a live index. You have to reindex to a new index with the new shard count.

The only escape hatches:

  • Split API — increases primary shards by a multiple (e.g. 2 → 4, 3 → 9). Requires the index to be read-only first.
  • Shrink API — decreases primary shards down to a factor. Index must be read-only and all primaries on one node.

Both are expensive operations.

Sizing shards — the rules of thumb

  • 20–50 GB per shard is the sweet spot for search-heavy workloads.
  • Aim for < 200M docs per shard (Lucene’s hard limit is around 2 billion, but performance drops well before).
  • Heap memory per node should be > 1 GB per ~20 shards the node hosts.
  • Don’t over-shard. Each shard has fixed overhead (file handles, memory, refresh cost). 1000 tiny shards is worse than 10 healthy ones.

Settings example

PUT /orders
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

number_of_replicas is changeable any time. Each replica is a full copy of a primary shard for HA and read throughput.

By default, a search on a 5-shard index hits all 5 shards and the coordinating node merges results. If we know all the docs we care about live on one shard, we can hit just that shard.

Multi-tenant SaaS app, where every query filters by tenant_id:

# Index with custom routing
POST /tickets/_doc?routing=tenant_42
{
  "tenant_id": "tenant_42",
  "subject": "Login bug"
}

# Search the single relevant shard
GET /tickets/_search?routing=tenant_42
{
  "query": { "term": { "tenant_id": "tenant_42" } }
}

Massive throughput win — instead of 5 shards each doing 1/5 of the work, 1 shard does 1/5 of the work and the others stay idle for other tenants’ queries.

The downside: hot shards

If tenant_42 is a huge customer, that one shard gets all their data and all their queries. We get a hot shard — uneven storage, uneven CPU, uneven latency. Mitigations:

  • Use a composite routing key like tenant_id + "_" + region.
  • Use the routing_partition_size setting to spread one tenant across multiple shards.

When to over-shard vs under-shard

  • Time-series logs (write-heavy, retention): use index-per-day or data streams, with maybe 1 shard per index.
  • Reference data (small, hot, read-heavy): 1 primary, many replicas.
  • Big general search index: aim for 30–50 GB shards. Plan growth — if you’ll hit 1 TB, start with 20+ primaries.

The interview-quality answer: “primary shard count is fixed at creation, picked based on expected growth, target shard size, and routing patterns; replicas are tunable for read throughput and HA”.