Redis Pub/Sub, Distributed Locks, and Cluster

advanced redis pub-sub distributed-locks redlock cluster sentinel

Redis isn’t just about storing and retrieving data. It also gives us tools for real-time messaging (Pub/Sub), coordination between services (distributed locks), and horizontal scaling (Redis Cluster). These are the features that turn Redis from a simple cache into a system-level building block.

Pub/Sub — Fire and Forget Messaging

Redis Pub/Sub lets us broadcast messages to multiple subscribers in real-time. Publishers send messages to a channel, and any client subscribed to that channel receives them instantly.

Think of it like a radio broadcast. The station (publisher) doesn’t know or care who’s listening. Listeners (subscribers) tune into a channel and hear whatever comes through.

# Terminal 1 — Subscribe to a channel
SUBSCRIBE notifications

# Terminal 2 — Publish a message
PUBLISH notifications "New order received!"
# Returns the number of subscribers who received it

# Pattern-based subscription (wildcards)
PSUBSCRIBE order:*       # matches order:created, order:shipped, etc.

The Big Caveat: No Persistence

Here’s the thing we need to remember — Pub/Sub is fire-and-forget. If a subscriber isn’t connected when a message is published, that message is gone forever. There’s no queue, no replay, no message history.

This means:

  • If our subscriber crashes and reconnects, it misses everything that happened while it was down
  • If nobody is listening on a channel, published messages just vanish
  • There’s no acknowledgment — we don’t know if anyone received our message

When to Use Pub/Sub vs Streams

If we need guaranteed delivery, message replay, or consumer groups — use Redis Streams instead. Pub/Sub is great for real-time notifications where losing a message occasionally is acceptable:

  • Cache invalidation signals across multiple app servers
  • Live dashboard updates
  • Chat messages (if we’re okay with no history)
  • Real-time notifications

Distributed Locks

In a distributed system, sometimes we need to make sure only one process does something at a time — processing a payment, sending an email, updating a shared resource. We need a distributed lock.

The Simple Approach

Redis gives us the building blocks with a single command:

# Acquire lock: set a key only if it doesn't exist, with a timeout
SET lock:order:123 "worker_1" NX EX 30
# NX = only if Not eXists
# EX 30 = expires in 30 seconds (so the lock auto-releases if we crash)

# Do our work...

# Release lock: delete it (but only if WE hold it)
# Use a Lua script to make this atomic
EVAL "if redis.call('GET', KEYS[1]) == ARGV[1] then return redis.call('DEL', KEYS[1]) else return 0 end" 1 lock:order:123 worker_1

Why the Lua script for releasing? Because we need to check-and-delete atomically. Without it, we could accidentally delete someone else’s lock:

  1. Worker A’s lock expires (timeout)
  2. Worker B acquires the lock
  3. Worker A finishes its work and deletes the key — but that’s now Worker B’s lock!

The Lua script prevents this by checking the value before deleting.

The Problem with Single-Instance Locks

What if our single Redis server goes down? The lock disappears, and now multiple processes think they have the lock. That’s where Redlock comes in.

Redlock Algorithm

Redlock uses multiple independent Redis instances (not replicas — completely separate servers). The idea:

  1. Get the current time
  2. Try to acquire the lock on all N Redis instances with the same key and a random value
  3. If we got the lock on at least N/2 + 1 instances (majority), and the total time to acquire was less than the lock timeout, we have the lock
  4. If we failed, release the lock on all instances
Redis 1
LOCKED
Redis 2
LOCKED
Redis 3
LOCKED
Redis 4
FAILED
Redis 5
LOCKED
Redlock with 5 instances — 4/5 locked (majority achieved) — lock acquired

Martin Kleppmann’s Critique

Martin Kleppmann (author of “Designing Data-Intensive Applications”) wrote a famous critique of Redlock. His argument:

  1. Process pauses — even after acquiring the lock, a GC pause or network delay can cause the lock to expire while we think we still have it
  2. Clock drift — Redlock relies on time. If a server’s clock jumps forward, the lock can expire early
  3. Fencing tokens — a safer approach is to use a monotonically increasing token with each lock acquisition. The protected resource rejects operations with old tokens

In simple language, Redlock gives us a “best effort” lock but can’t guarantee correctness in all edge cases. If we truly need a safe distributed lock, Kleppmann recommends using a consensus system like ZooKeeper or etcd.

For most practical use cases (deduplication, rate limiting, preventing duplicate jobs), Redlock works just fine. For critical financial operations, we should think twice.

Redis Cluster

When a single Redis server isn’t enough — either the dataset is too large to fit in one machine’s memory, or we need more throughput — we can use Redis Cluster for horizontal scaling.

How It Works

Redis Cluster divides the key space into 16,384 hash slots. Each key is mapped to a slot using CRC16(key) % 16384. Each node in the cluster owns a subset of these slots.

# Which slot does a key map to?
CLUSTER KEYSLOT "user:42"     # returns a number 0-16383

# Example: 3-node cluster
# Node A: slots 0-5460
# Node B: slots 5461-10922
# Node C: slots 10923-16383

When we send a command to the wrong node, it returns a MOVED redirect telling us which node actually has that slot. Smart Redis clients learn the slot mapping and route commands directly.

Replication Within the Cluster

Each primary node can have replicas. If a primary goes down, its replica gets promoted automatically. So a typical production cluster might have 3 primaries + 3 replicas = 6 nodes.

Limitations

  • Multi-key operations only work if all keys are on the same slot. We can force this with hash tags: {user:42}:profile and {user:42}:settings both hash to the same slot because Redis only hashes the part inside {}.
  • No cross-slot transactionsMULTI/EXEC only works within a single slot.

Sentinel vs Cluster

Redis Sentinel
High availability for single-server Redis
Automatic failover (primary → replica)
NO sharding — all data on one server
Use when: data fits in one server
Redis Cluster
Horizontal scaling across multiple servers
Automatic sharding + failover
Data split across 16,384 hash slots
Use when: data/traffic exceeds one server

In simple language: Sentinel gives us failover (if the primary dies, a replica takes over). Cluster gives us failover AND sharding (data split across multiple servers). If our dataset fits in one machine, Sentinel is simpler to operate. If we need to scale beyond a single server, Cluster is the way to go.