Design Principles and Trade-offs - High-Level Design

System design is all about trade-offs. There’s no perfect system — every choice we make comes with a cost. The key is understanding what we’re gaining and what we’re giving up with each decision.

Key Design Principles

Scalability

The ability to handle more load by adding resources. Two flavors:

Vertical scaling (scale up): Bigger machine. More RAM, faster CPU. Simple but has a ceiling.
Horizontal scaling (scale out): More machines. Harder to implement but virtually unlimited.

Think of it like a restaurant. Vertical scaling = get a bigger kitchen. Horizontal scaling = open more locations.

Availability

The system is up and working when users need it. Measured in “nines”:

Availability	Downtime/Year	Downtime/Month
99% (two 9s)	3.65 days	7.3 hours
99.9% (three 9s)	8.76 hours	43.8 min
99.99% (four 9s)	52.6 min	4.38 min
99.999% (five 9s)	5.26 min	26.3 sec

Reliability

The system does what it’s supposed to do correctly. A system can be available (it’s responding) but unreliable (it’s giving wrong answers). We need both.

Performance

How fast the system responds. Two key metrics:

Latency — Time to handle a single request (usually p50, p95, p99)
Throughput — How many requests we handle per second

Maintainability

Can other engineers (or future us) understand and modify this system? Simple designs beat clever ones.

The Big Trade-offs

Common Trade-off Spectrums

Consistency ◄━━━━━━━━━━━► Availability

Banking, inventory Social media, DNS

Low Latency ◄━━━━━━━━━━━► High Throughput

Gaming, trading Batch processing, analytics

Simplicity ◄━━━━━━━━━━━► Performance

Monolith, single DB Microservices, sharding

Cost ◄━━━━━━━━━━━━━━━► Performance

Single region Multi-region, redundancy

Consistency vs Availability

This is the famous CAP theorem in disguise. In simple language: when a network issue happens, we have to choose — do we give users potentially stale data (availability) or do we tell them “try again later” (consistency)?

Bank account: Must be consistent. We can’t show the wrong balance.
Social media feed: Availability wins. If a like takes 2 seconds to show up globally, nobody cares.

Latency vs Throughput

We can make one request super fast (low latency) or handle a massive number of requests (high throughput), but optimizing for one often hurts the other. Batching requests improves throughput but increases latency for individual requests.

Single Point of Failure (SPOF)

A SPOF is any component whose failure takes down the entire system. Every part of our design should have a backup plan:

One server? Add another behind a load balancer.
One database? Add a replica.
One data center? Deploy across multiple regions.
One load balancer? Use active-passive failover.

The rule is simple: if it can fail, assume it will. Then plan for it.

Stateless vs Stateful Services

Stateful services store information about the current session (like “this user is logged in”). If that server goes down, the state is lost.

Stateless services don’t remember anything between requests. Every request carries all the information needed (like a JWT token). Any server can handle any request.

Stateless services are much easier to scale — we just add more servers behind a load balancer and it just works. That’s why we push state to external stores (databases, Redis, sessions stores) and keep our application servers stateless.

Key Takeaway

In simple language, there’s no “best” architecture. There’s only the right architecture for our specific requirements. A system design interview is our chance to show we understand these trade-offs and can make informed decisions — not just draw boxes on a whiteboard.