Load Balancers - High-Level Design

A load balancer is a device (or software) that sits in front of our servers and distributes incoming traffic across multiple servers. Think of it like a host at a restaurant — they don’t let everyone pile into one table; they spread diners evenly across the floor.

Why We Need Load Balancers

Without a load balancer, all traffic goes to one server. That’s bad for two reasons:

Single point of failure — If that server crashes, the whole system is down.
Limited capacity — One server can only handle so much traffic.

A load balancer solves both problems. If one server dies, traffic goes to the others. If traffic grows, we add more servers.

Where Load Balancers Sit

Load Balancer in Action

Users

↓ ↓ ↓ ↓ ↓

Load Balancer

↙ ↓ ↘

Server 1 Server 2 Server 3

↘ ↓ ↙

Database

We can place load balancers at multiple points:

Between users and web servers (most common)
Between web servers and application servers
Between application servers and databases

L4 vs L7 Load Balancing

L4 (Transport Layer) — Routes based on IP address and port. Fast but dumb. It doesn’t look at the content of the request. Think of it like sorting mail by zip code.

L7 (Application Layer) — Routes based on the actual content: URL path, HTTP headers, cookies. Slower but smarter. We can send /api/* requests to API servers and /static/* to file servers.

Most modern load balancers (NGINX, AWS ALB) operate at L7.

Load Balancing Algorithms

Algorithm	How It Works	Best For
Round Robin	Requests go to servers in order: 1, 2, 3, 1, 2, 3…	Servers with equal capacity
Weighted Round Robin	Same but some servers get more traffic (proportional to weight)	Servers with different specs
Least Connections	Send to the server with fewest active connections	Varying request durations
IP Hash	Hash the client IP to pick a server (same client always goes to same server)	When we need sticky sessions
Random	Pick a server randomly	Surprisingly effective at scale

Round Robin is the simplest and works great when all servers are identical and requests take roughly the same time.

Least Connections is better when some requests take much longer than others (like file uploads vs simple GETs).

IP Hash ensures the same client always hits the same server. Useful when we have session data on the server (though we should prefer stateless servers + external session stores).

Health Checks

Load balancers periodically ping each server to check if it’s alive. If a server stops responding, the load balancer removes it from the rotation. When it comes back, it gets added again.

LB → GET /health → Server 1: 200 OK  ✓ (keep in rotation)
LB → GET /health → Server 2: timeout ✗ (remove from rotation)
LB → GET /health → Server 3: 200 OK  ✓ (keep in rotation)

Load Balancer Redundancy

Wait — if the load balancer is a single point of failure, what do we do? We use two load balancers in an active-passive setup:

Active LB handles all traffic
Passive LB monitors the active one
If active goes down, passive takes over (using a floating IP or DNS failover)

Some setups use active-active, where both LBs handle traffic simultaneously.

Popular Load Balancers

NGINX — Software LB, very popular, can also be a reverse proxy and web server
HAProxy — High-performance software LB, used by GitHub and Stack Overflow
AWS ALB/NLB — Managed cloud LBs (ALB = L7, NLB = L4)
Caddy — Modern LB with automatic HTTPS

In simple language, a load balancer is like a traffic cop for our servers. It spreads the work evenly, routes around failures, and lets us add more servers whenever we need more capacity. It’s one of the first things we add when scaling beyond a single server.