CAP Theorem - High-Level Design

The CAP theorem is one of the most important ideas in distributed systems. It says that when we build a system spread across multiple machines, we can only guarantee two out of three properties at the same time. Understanding this helps us make smart trade-offs when picking databases and designing architectures.

The Three Properties

Consistency (C) — Every read gets the most recent write. If we write “balance = $500,” every node in the system immediately returns $500. No stale data, ever.

Availability (A) — Every request gets a response (success or failure), even if some nodes are down. The system never says “sorry, I can’t respond right now.”

Partition Tolerance (P) — The system keeps working even when network communication between nodes breaks. Messages get dropped or delayed, and the system handles it.

Why We Can Only Pick Two

Think of it like this. We have two database nodes, Node A and Node B, that are supposed to stay in sync. Suddenly the network between them breaks (a partition).

Now a write comes in to Node A: “update balance to $500.”

We have two choices:

Stay consistent (CP) — Node B rejects reads because it can’t confirm it has the latest data. We sacrifice availability.
Stay available (AP) — Node B serves reads with its old data. We sacrifice consistency.

We can’t have both. If the network is broken, we either serve stale data (lose C) or refuse to respond (lose A). There’s no trick around this.

Partition Tolerance Is Not Optional

Here’s the thing most people miss: we always have to pick P. Network partitions happen in the real world. Cables get cut. Switches fail. Cloud regions have outages. If our system can’t handle a network partition, it’s not a real distributed system.

So the real choice is between CP and AP. It’s never “CA” in a distributed system.

CP vs AP Systems

CAP Triangle

Consistency (C)

╱ ╲

CP CA

╱ ╲

Availability (A) ──── Partition Tolerance (P)

CP: MongoDB, HBase, Redis Cluster, Zookeeper

AP: Cassandra, DynamoDB, CouchDB, Riak

CA: Single-node PostgreSQL, MySQL (not distributed)

CP Systems (Consistency + Partition Tolerance)

These systems prioritize correctness. During a partition, they’d rather refuse some requests than serve stale data.

MongoDB — In a replica set, if the primary goes down, writes are refused until a new primary is elected
HBase — Strong consistency via a single region server per row
Redis (single node) — All reads/writes go to one node, so it’s always consistent
Zookeeper — Used for distributed coordination where correctness is everything

Best for: Banking, inventory systems, anything where serving wrong data is worse than being temporarily unavailable.

AP Systems (Availability + Partition Tolerance)

These systems prioritize uptime. During a partition, they keep serving requests even if the data might be slightly stale.

Cassandra — Writes succeed on any available node, syncs later
DynamoDB — Designed for “always available” at Amazon scale
CouchDB — Multi-master replication, resolves conflicts later

Best for: Social media feeds, shopping carts, analytics — where showing slightly old data is fine but downtime is not.

How to Decide: CP or AP?

Ask ourselves these questions:

Is stale data dangerous? If a user sees an old bank balance and overdrafts, that’s bad → CP
Is downtime unacceptable? If our e-commerce site goes down during Black Friday, we lose millions → AP
Can we fix inconsistency later? If we can reconcile data after the partition heals → AP is fine

Most real-world systems aren’t purely CP or AP. They’re tunable. Cassandra, for example, lets us choose consistency level per query. We can read with QUORUM for important data (more consistent) and ONE for less critical data (more available).

Common Misconceptions

“CAP means we pick exactly two.” Not really. In normal operation (no partition), we get all three. CAP only forces a choice during a partition event.

“CA systems exist in distributed setups.” Nope. If we’re distributed, partitions will happen. CA only applies to single-node databases.

“CP means zero availability.” No. CP systems are available most of the time. They only sacrifice availability during actual partition events, which are (hopefully) rare.

Key Takeaway

In simple language, CAP theorem tells us that in a distributed system, when the network breaks, we have to choose: do we serve possibly-stale data (AP), or do we refuse requests until we’re sure the data is correct (CP)? Neither choice is wrong — it depends on what our system is doing. Banks pick consistency. Social media picks availability. The key is knowing which trade-off we’re making and why.