When your data outgrows one machine’s RAM, or your write load saturates one master, you need horizontal sharding. Redis Cluster is the built-in answer. It splits the keyspace across multiple masters, each with its own replicas, and handles routing automatically.
In simple language: imagine 16,384 numbered buckets. Every key hashes into exactly one bucket. We split the buckets across our N nodes — each node owns a range. When a client wants user:42, the client computes the bucket, knows which node owns it, and talks directly to that node.
The 16384 hash slots
Redis hashes the key with CRC16 and mod-16384 to assign a slot:
slot = CRC16(key) mod 16384
Slots, not keys, are the unit of distribution. With 3 masters you might have:
Adding a node = rebalance slots from existing nodes to the new one (live, no downtime). Removing a node = migrate its slots away first.
Client routing: MOVED and ASK
A naive client doesn’t know which node owns which slot. It connects to any node, sends a command, and Redis replies with one of:
MOVED 3999 10.0.0.7:6379— “that slot lives on this other node, go there and update your map”ASK 3999 10.0.0.7:6379— “that slot is currently being migrated, try the destination once for THIS command”
Smart clients (ioredis, lettuce, jedis with cluster mode) build and cache the slot map by calling CLUSTER SLOTS on connect, then route directly. They only need MOVED to refresh their map after a topology change.
const Redis = require("ioredis");
const cluster = new Redis.Cluster([
{ host: "10.0.0.5", port: 6379 },
{ host: "10.0.0.6", port: 6379 },
{ host: "10.0.0.7", port: 6379 },
]);
await cluster.set("user:42", "manish"); // client routes to correct node
Gossip: how nodes find each other
Cluster nodes talk to each other over a separate “cluster bus” port (6379 + 10000 by default) using a gossip protocol. Each node periodically pings a few random peers carrying:
- Known node states (alive/PFAIL/FAIL)
- Slot ownership
- Epoch (version) numbers
If enough masters agree a node is down (PFAIL → FAIL), they coordinate to promote one of its replicas. Failover is automatic — no Sentinel needed in Cluster mode.
Multi-key operations: the catch
Commands touching multiple keys only work if all those keys live on the same node, i.e., the same slot. Otherwise you get CROSSSLOT errors.
MSET a 1 b 2 c 3 # likely CROSSSLOT - a, b, c hash to different slots
Hash tags to the rescue
To force keys onto the same slot, wrap a portion of the key in {}. Only that part is hashed:
SET {user:42}:profile "..."
SET {user:42}:cart "..."
SET {user:42}:orders "..."
# all hash to the same slot - same node
MULTI; HSET {user:42}:profile name "M"; SADD {user:42}:cart sku-1; EXEC
The trade-off: too aggressive tagging concentrates load on one node. Use tags only for keys that genuinely need atomic multi-key access.
Replication and failover
Each master typically has 1+ replicas. Replication is async (same as standalone). On master failure:
- Other masters detect FAIL via gossip.
- One of the dead master’s replicas runs an election among masters.
- Winner takes over the slots and starts serving them.
If a master has no replica and dies, those slots are unavailable — and Cluster goes into a degraded state. By default, the whole cluster refuses writes if any slot is unreachable (cluster-require-full-coverage yes).
Limits and gotchas
- Lua and transactions must touch keys on a single slot (use hash tags).
- Pub/Sub — classic Pub/Sub broadcasts across the whole cluster (chatty). Use Sharded Pub/Sub (
SSUBSCRIBE, Redis 7+) for per-shard channels. - SELECT db isn’t supported. Cluster only uses db 0.
- Big multi-key commands (
KEYS *,FLUSHDB) need to be issued to every node, not just one. - Resharding is live but takes time and pushes load during migration.
When to choose Cluster vs Sentinel
| Need | Pick |
|---|---|
| Dataset fits one machine, just want HA | Sentinel |
| Need to scale writes or RAM beyond one node | Cluster |
| Heavy multi-key transactions across many keys | Sentinel (or carefully designed hash tags) |
| Operational simplicity | Sentinel |
Quick recap
- 16384 hash slots, distributed across masters.
- Smart clients cache slot maps and route directly.
- Multi-key ops need same slot — use hash tags
{...}. - Gossip protocol handles membership + failure detection.
- Automatic failover via replicas, no Sentinel required.
- It’s eventually consistent, async-replicated, with the same data-loss-on-crash trade-off.