System design is all about trade-offs. There’s no perfect system — every choice we make comes with a cost. The key is understanding what we’re gaining and what we’re giving up with each decision.
Key Design Principles
Scalability
The ability to handle more load by adding resources. Two flavors:
- Vertical scaling (scale up): Bigger machine. More RAM, faster CPU. Simple but has a ceiling.
- Horizontal scaling (scale out): More machines. Harder to implement but virtually unlimited.
Think of it like a restaurant. Vertical scaling = get a bigger kitchen. Horizontal scaling = open more locations.
Availability
The system is up and working when users need it. Measured in “nines”:
| Availability | Downtime/Year | Downtime/Month |
|---|---|---|
| 99% (two 9s) | 3.65 days | 7.3 hours |
| 99.9% (three 9s) | 8.76 hours | 43.8 min |
| 99.99% (four 9s) | 52.6 min | 4.38 min |
| 99.999% (five 9s) | 5.26 min | 26.3 sec |
Reliability
The system does what it’s supposed to do correctly. A system can be available (it’s responding) but unreliable (it’s giving wrong answers). We need both.
Performance
How fast the system responds. Two key metrics:
- Latency — Time to handle a single request (usually p50, p95, p99)
- Throughput — How many requests we handle per second
Maintainability
Can other engineers (or future us) understand and modify this system? Simple designs beat clever ones.
The Big Trade-offs
Consistency vs Availability
This is the famous CAP theorem in disguise. In simple language: when a network issue happens, we have to choose — do we give users potentially stale data (availability) or do we tell them “try again later” (consistency)?
- Bank account: Must be consistent. We can’t show the wrong balance.
- Social media feed: Availability wins. If a like takes 2 seconds to show up globally, nobody cares.
Latency vs Throughput
We can make one request super fast (low latency) or handle a massive number of requests (high throughput), but optimizing for one often hurts the other. Batching requests improves throughput but increases latency for individual requests.
Single Point of Failure (SPOF)
A SPOF is any component whose failure takes down the entire system. Every part of our design should have a backup plan:
- One server? Add another behind a load balancer.
- One database? Add a replica.
- One data center? Deploy across multiple regions.
- One load balancer? Use active-passive failover.
The rule is simple: if it can fail, assume it will. Then plan for it.
Stateless vs Stateful Services
Stateful services store information about the current session (like “this user is logged in”). If that server goes down, the state is lost.
Stateless services don’t remember anything between requests. Every request carries all the information needed (like a JWT token). Any server can handle any request.
Stateless services are much easier to scale — we just add more servers behind a load balancer and it just works. That’s why we push state to external stores (databases, Redis, sessions stores) and keep our application servers stateless.
Key Takeaway
In simple language, there’s no “best” architecture. There’s only the right architecture for our specific requirements. A system design interview is our chance to show we understand these trade-offs and can make informed decisions — not just draw boxes on a whiteboard.