Shards & Replicas - Elasticsearch

This is where Elasticsearch’s distributed nature shows up. An index isn’t actually a single thing on disk — it’s split into pieces called shards, and each shard can have copies called replicas.

Why shard?

Imagine we have 500 million product documents. No single machine can hold all that and search it fast. So we split the index into N pieces (shards), and each piece lives on a different node. Now searches run in parallel across nodes.

In simple language, sharding = horizontal scaling for an index.

Primary vs replica

Each shard exists in two flavors:

Primary shard — the original. All writes go here first.
Replica shard — an exact copy of a primary, on a different node. Used for redundancy and read throughput.

Critical rule: a primary and its replica never live on the same node. Otherwise, if that node dies, we lose both. ES enforces this automatically.

Index "products": 3 primary shards, 1 replica each = 6 total shards across 3 nodes

NODE 1

P0 (primary)

R2 (replica of P2)

NODE 2

P1 (primary)

R0 (replica of P0)

NODE 3

P2 (primary)

R1 (replica of P1)

If Node 2 dies → R1 on Node 3 promotes to primary. R0 on Node 2 is lost, ES reassigns it elsewhere.

How a document ends up on a specific shard

ES uses a simple formula:

shard = hash(routing) % number_of_primary_shards

Where routing defaults to the document’s _id. This is why you can’t change number_of_shards after index creation — the hash math would break and existing docs would map to wrong shards.

Replicas are not part of that formula. They just mirror their primary.

Writes vs reads

Write path — request hits any node → coordinator routes to the primary → primary writes locally → forwards to all replicas → replicas ack → primary responds. Synchronous to replicas by default.
Read path — request hits any node → coordinator picks any copy (primary OR replica) for each shard → returns results. This is why more replicas = more read throughput.

PUT /products/_doc/1
{
  "title": "Sony WH-1000XM5"
}

// Goes to: hash("1") % 3 = shard 2's primary → then replicated to R2

Setting shards & replicas

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

number_of_shards — immutable after creation. Plan ahead.
number_of_replicas — can be changed anytime. Bump it up for hot indices, down for cold ones.

Sizing rules of thumb

The internet is full of bad advice here. Elastic’s official recommendations:

Aim for shards between 10 GB and 50 GB each.
Don’t over-shard. Each shard has overhead — file handles, memory, metadata. 1000 shards of 100 MB each is way worse than 10 shards of 10 GB.
For time-series (logs), one daily index with 1 primary shard is often fine for small clusters.
Keep total shards per GB of JVM heap under ~20.

For our 500M product index? Maybe 5 primary shards (~100 GB total / 20 GB each), 1 replica = 10 shards across 4-6 nodes. Solid starting point.

TL;DR

Shards split the data, replicas duplicate the data. Primary takes writes, replicas serve reads and exist for failover. You pick number_of_shards once and live with it.