Replication & Sentinel - Redis

Redis is fast, but a single instance is a single point of failure. Replication gives us read scaling and a hot standby. Sentinel automates failover when the master dies. Together they’re the simplest path to high availability — short of going full Cluster.

In simple language: one Redis is the boss (master), others are clones (replicas) that copy everything the boss does. If the boss dies, Sentinel — a group of watchers — agrees on which clone gets promoted.

How replication works

Replicas connect to the master with REPLICAOF host port (or in config). The flow:

Replica connects, master sends a full RDB snapshot of current data.
Replica loads the snapshot.
Master then streams every write command from a replication backlog buffer.
Replica applies them in order.

If the replica disconnects briefly, it can do a partial resync using a replication offset and backlog — no full snapshot needed. If it’s gone too long, it falls back to a full resync.

# On replica
replicaof 10.0.0.5 6379
replica-read-only yes

INFO replication
# role:master
# connected_slaves:2
# slave0:ip=10.0.0.6,port=6379,state=online,offset=...

Async = trade-off

Redis replication is asynchronous. The master acks the write to the client BEFORE replicas have it. If the master crashes before replication, those writes are lost.

You can soften this with WAIT numreplicas timeout — a command that blocks until N replicas have acked, but it’s a best-effort guarantee, not durability:

SET foo bar
WAIT 2 100   # wait up to 100ms for 2 replicas to confirm

For strict durability you also need AOF with fsync always. Most setups accept the small data-loss window in exchange for speed.

Sentinel: automated failover

Replication alone doesn’t help if the master dies — clients are still pointed at a dead instance. Sentinel is a separate process (you run several, usually 3 or 5) that:

Monitors masters and replicas
Detects failures via PING
Coordinates failover by quorum
Tells clients the new master’s address

Master

→ replicates →

Replica 1

Replica 2

Sentinels (watching everyone)

Sentinel A

Sentinel B

Sentinel C

Quorum = 2/3 must agree master is down → vote to promote replica

Failure detection: SDOWN and ODOWN

SDOWN (Subjectively Down): one Sentinel can’t reach the master.
ODOWN (Objectively Down): a quorum of Sentinels confirm SDOWN.

Only ODOWN triggers failover. This prevents one flaky network link from causing a split brain.

The failover dance

Sentinels reach ODOWN consensus.
They elect a leader Sentinel (Raft-ish).
Leader picks the best replica (lowest priority number, longest replication offset, lowest run ID).
Issues REPLICAOF NO ONE on the chosen replica — it becomes master.
Reconfigures the other replicas to follow the new master.
Updates monitoring config so clients ask Sentinel for the new master.

Client integration

Clients connect to Sentinels first, not Redis directly:

const Redis = require("ioredis");
const redis = new Redis({
  sentinels: [
    { host: "sentinel-1", port: 26379 },
    { host: "sentinel-2", port: 26379 },
    { host: "sentinel-3", port: 26379 },
  ],
  name: "mymaster",   // master name configured in sentinel.conf
});

The client library handles asking Sentinel for the current master and reconnecting on failover.

Sentinel config essentials

# sentinel.conf
sentinel monitor mymaster 10.0.0.5 6379 2
# name             host       port quorum (min sentinels to agree)
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1

parallel-syncs = how many replicas resync from the new master at once. Higher = faster recovery but bigger bandwidth spike.

Limits

Replication is async — possible data loss on master crash.
Failover takes seconds (detection + election + promotion). Clients see errors during this window.
Doesn’t shard data. One master holds the whole dataset. When the dataset outgrows one machine, you need Cluster.

Quick recap

Master-replica = read scaling + hot standby. Async, so writes can be lost.
Sentinel = quorum-based watchers that automate failover.
Run an odd number of Sentinels (3 or 5) for quorum, on separate hosts.
Clients talk to Sentinels to discover the current master.
For sharding too, use Redis Cluster (next note).