Replication - DBMS

Replication is one of the most fundamental concepts in database scaling. In simple language, it means keeping copies of the same data on multiple servers. If one server dies, another has the data. If one server is overloaded with reads, we spread the load across copies.

Every production database of any significant size uses replication. Let’s understand how it works.

Why Replicate?

Three main reasons:

High availability — if the primary server crashes, a replica can take over. No downtime.
Read scaling — distribute read queries across multiple replicas. One server handling 10,000 reads/sec becomes 3 servers handling 3,300 each.
Disaster recovery — a replica in a different data center protects against entire facility failures.

Master-Slave (Primary-Replica)

This is the most common replication setup. One server is the primary (master) — it handles all writes. One or more replicas (slaves) receive a copy of every change and serve read queries.

Application

writes

↓

Primary

reads + writes

reads

↓

Replica 1

read-only

Replica 2

read-only

Primary ──replication──→ Replicas (async or sync)

How It Works

The primary writes changes to a write-ahead log (WAL in PostgreSQL, binlog in MySQL)
Replicas connect to the primary and stream these changes
Replicas apply the changes to their own copy of the data
Clients can read from any replica

-- PostgreSQL: create a replica (simplified)
-- On the primary: configure pg_hba.conf to allow replication connections
-- On the replica:
pg_basebackup -h primary-host -D /var/lib/postgresql/data -U replicator -P

-- Check replication status on the primary
SELECT * FROM pg_stat_replication;

Replication Lag

Here’s the catch. Replicas don’t get changes instantly — there’s always some delay between a write hitting the primary and showing up on replicas. This is called replication lag.

Most of the time, lag is measured in milliseconds. But under heavy load, it can spike to seconds or even minutes. This means if a user writes something and immediately reads it from a replica, they might not see their own change.

Common solution: read-your-own-writes — route a user’s reads to the primary for a short window after they write something.

Master-Master (Multi-Primary)

In multi-primary replication, both (or all) nodes accept writes. Each node replicates changes to the others.

This sounds great in theory — no single point of failure for writes! But in practice, it introduces a nasty problem: write conflicts.

The Conflict Problem

What happens if User A updates a row on Node 1, and User B updates the same row on Node 2 at the same time? Both writes succeed locally, but when they replicate to each other — which one wins?

Conflict resolution strategies:

Last-write-wins (LWW) — use timestamps. Whichever write has the later timestamp wins. Simple but can lose data.
Application-level resolution — the app decides how to merge conflicting writes (like Git merge conflicts)
CRDTs (Conflict-free Replicated Data Types) — special data structures designed so conflicts are impossible

When to Use Multi-Primary

Multi-region writes — users in Europe write to the European primary, users in Asia write to the Asian one. Avoids cross-continent latency for writes.
High write availability — if one primary goes down, the other still accepts writes.

But honestly, most applications are better off with a single primary. Multi-primary adds a lot of complexity.

Synchronous vs Asynchronous Replication

This is a critical design decision.

Asynchronous (Default for Most Systems)

The primary writes to its local storage and immediately confirms the write to the client. Replication happens in the background.

Fast writes — no waiting for replicas
Data loss risk — if the primary crashes before replicating, those writes are gone
Stale reads — replicas can lag behind

Synchronous

The primary waits for at least one replica to confirm it received the write before confirming to the client.

No data loss on failover — at least one replica is always up to date
Slower writes — every write waits for a network round-trip to the replica
Availability risk — if the synchronous replica is down, writes block

Semi-Synchronous (Practical Compromise)

One replica is synchronous (guaranteed up-to-date), the rest are async. If the sync replica goes down, another async replica gets promoted to sync.

-- PostgreSQL: configure synchronous replication
-- In postgresql.conf on the primary
synchronous_commit = on
synchronous_standby_names = 'replica1'

-- Check sync status
SELECT * FROM pg_stat_replication;

Failover

When the primary goes down, we need to promote a replica to become the new primary. This can be manual or automatic.

Manual Failover

A DBA notices the primary is down, picks the most up-to-date replica, promotes it, and reconfigures the application to point to the new primary. This can take minutes.

Automatic Failover

A monitoring system detects the failure and automatically promotes a replica. Tools for this:

PostgreSQL: pg_auto_failover, Patroni (with etcd/ZooKeeper)
MySQL: MySQL Group Replication, Orchestrator, ProxySQL
Redis: Redis Sentinel

The tricky part is avoiding split-brain — a situation where the old primary comes back online and both it and the new primary think they’re in charge. This is why automatic failover tools use consensus protocols (like Raft) to agree on who the primary is.

Replication Topologies

Beyond simple primary-replica, we can get creative:

Chain replication — Primary → Replica A → Replica B. Reduces load on the primary (it only replicates to one node), but adds more lag for downstream replicas.
Star topology — Primary → multiple replicas directly. More load on the primary, but less lag.
Cascading — a replica can be the source for other replicas. Useful for cross-region setups.

Key Takeaways for Interviews

Replication gives us availability and read scaling, not write scaling
Async replication is faster but risks data loss; sync is safer but slower
Replication lag is inevitable — design our app to handle it
Multi-primary sounds appealing but conflict resolution is genuinely hard
Automatic failover needs consensus to avoid split-brain