RDB Snapshots - Redis

RDB (Redis Database) is Redis’s snapshot-based persistence. Every so often, Redis dumps the entire dataset to a single binary file called dump.rdb. When Redis restarts, it loads that file back into memory.

In simple language — think of it like saving a video game. You play for a while, hit “save”, and the game writes everything to disk. If your computer crashes, you reload the save file and continue. But any progress between the last save and the crash is lost.

Why use RDB?

Compact — single binary file, easy to back up, easy to ship to another machine.
Fast restarts — loading an RDB file is faster than replaying a log.
Minimal runtime cost — the main process barely does any work for the snapshot itself.

The catch — if Redis crashes between snapshots, you lose everything written after the last snapshot.

How it works — BGSAVE and fork()

When Redis decides to snapshot, it calls BGSAVE. Here’s the trick — Redis uses fork() to create a child process. The child writes the snapshot. The parent keeps serving commands.

fork() uses copy-on-write (COW). The child gets a “copy” of memory, but the OS doesn’t actually duplicate the RAM. It only copies pages that get modified. So if your dataset is 10 GB and writes are slow, the snapshot barely uses any extra memory.

BGSAVE with fork() + Copy-on-Write

Parent (Redis)
keeps serving
GET / SET / DEL
clients never block

fork()→

Child
walks shared memory
writes dump.rdb
exits when done

COW = shared pages until the parent modifies them. Memory usage grows only with write rate during snapshot.

Triggering snapshots

Three ways — automatic (config), manual (commands), or on shutdown.

# redis.conf — save if condition met
save 3600 1       # after 3600s (1h) if at least 1 key changed
save 300 100      # after 300s if at least 100 keys changed
save 60 10000     # after 60s if at least 10000 keys changed

Manual commands:

SAVE       # blocking — main process does the snapshot. Avoid in prod.
BGSAVE     # non-blocking — fork() child writes. Use this.
LASTSAVE   # unix timestamp of last successful save

Tradeoffs

Pro	Con
Compact single file	Data loss between snapshots
Fast restart from snapshot	`fork()` can stall on huge datasets
Great for backups / replication	No fine-grained recovery point

If your dataset is 50 GB on a memory-pressured box, fork() itself can hiccup because the OS has to set up page tables. That’s a real concern at scale.

When RDB alone is fine

You use Redis as a cache — losing the last few minutes is no big deal, you’ll repopulate from the source of truth.
You take periodic backups and ship dump.rdb to S3.
You want fast restarts and don’t need durability for every write.

If you do need stronger durability, pair RDB with AOF — that’s hybrid persistence, the recommended setup.

Why use RDB?

How it works — BGSAVE and fork()

Triggering snapshots

Tradeoffs

When RDB alone is fine

References