Load Balancers

beginner 0-2 YOE system-design load-balancer scaling availability

A load balancer is a device (or software) that sits in front of our servers and distributes incoming traffic across multiple servers. Think of it like a host at a restaurant — they don’t let everyone pile into one table; they spread diners evenly across the floor.

Why We Need Load Balancers

Without a load balancer, all traffic goes to one server. That’s bad for two reasons:

  1. Single point of failure — If that server crashes, the whole system is down.
  2. Limited capacity — One server can only handle so much traffic.

A load balancer solves both problems. If one server dies, traffic goes to the others. If traffic grows, we add more servers.

Where Load Balancers Sit

Load Balancer in Action
Users
↓ ↓ ↓ ↓ ↓
Load Balancer
Server 1 Server 2 Server 3
Database

We can place load balancers at multiple points:

  • Between users and web servers (most common)
  • Between web servers and application servers
  • Between application servers and databases

L4 vs L7 Load Balancing

L4 (Transport Layer) — Routes based on IP address and port. Fast but dumb. It doesn’t look at the content of the request. Think of it like sorting mail by zip code.

L7 (Application Layer) — Routes based on the actual content: URL path, HTTP headers, cookies. Slower but smarter. We can send /api/* requests to API servers and /static/* to file servers.

Most modern load balancers (NGINX, AWS ALB) operate at L7.

Load Balancing Algorithms

AlgorithmHow It WorksBest For
Round RobinRequests go to servers in order: 1, 2, 3, 1, 2, 3…Servers with equal capacity
Weighted Round RobinSame but some servers get more traffic (proportional to weight)Servers with different specs
Least ConnectionsSend to the server with fewest active connectionsVarying request durations
IP HashHash the client IP to pick a server (same client always goes to same server)When we need sticky sessions
RandomPick a server randomlySurprisingly effective at scale

Round Robin is the simplest and works great when all servers are identical and requests take roughly the same time.

Least Connections is better when some requests take much longer than others (like file uploads vs simple GETs).

IP Hash ensures the same client always hits the same server. Useful when we have session data on the server (though we should prefer stateless servers + external session stores).

Health Checks

Load balancers periodically ping each server to check if it’s alive. If a server stops responding, the load balancer removes it from the rotation. When it comes back, it gets added again.

LB → GET /health → Server 1: 200 OK  ✓ (keep in rotation)
LB → GET /health → Server 2: timeout ✗ (remove from rotation)
LB → GET /health → Server 3: 200 OK  ✓ (keep in rotation)

Load Balancer Redundancy

Wait — if the load balancer is a single point of failure, what do we do? We use two load balancers in an active-passive setup:

  • Active LB handles all traffic
  • Passive LB monitors the active one
  • If active goes down, passive takes over (using a floating IP or DNS failover)

Some setups use active-active, where both LBs handle traffic simultaneously.

  • NGINX — Software LB, very popular, can also be a reverse proxy and web server
  • HAProxy — High-performance software LB, used by GitHub and Stack Overflow
  • AWS ALB/NLB — Managed cloud LBs (ALB = L7, NLB = L4)
  • Caddy — Modern LB with automatic HTTPS

In simple language, a load balancer is like a traffic cop for our servers. It spreads the work evenly, routes around failures, and lets us add more servers whenever we need more capacity. It’s one of the first things we add when scaling beyond a single server.