A load balancer is a device (or software) that sits in front of our servers and distributes incoming traffic across multiple servers. Think of it like a host at a restaurant — they don’t let everyone pile into one table; they spread diners evenly across the floor.
Why We Need Load Balancers
Without a load balancer, all traffic goes to one server. That’s bad for two reasons:
- Single point of failure — If that server crashes, the whole system is down.
- Limited capacity — One server can only handle so much traffic.
A load balancer solves both problems. If one server dies, traffic goes to the others. If traffic grows, we add more servers.
Where Load Balancers Sit
We can place load balancers at multiple points:
- Between users and web servers (most common)
- Between web servers and application servers
- Between application servers and databases
L4 vs L7 Load Balancing
L4 (Transport Layer) — Routes based on IP address and port. Fast but dumb. It doesn’t look at the content of the request. Think of it like sorting mail by zip code.
L7 (Application Layer) — Routes based on the actual content: URL path, HTTP headers, cookies. Slower but smarter. We can send /api/* requests to API servers and /static/* to file servers.
Most modern load balancers (NGINX, AWS ALB) operate at L7.
Load Balancing Algorithms
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Requests go to servers in order: 1, 2, 3, 1, 2, 3… | Servers with equal capacity |
| Weighted Round Robin | Same but some servers get more traffic (proportional to weight) | Servers with different specs |
| Least Connections | Send to the server with fewest active connections | Varying request durations |
| IP Hash | Hash the client IP to pick a server (same client always goes to same server) | When we need sticky sessions |
| Random | Pick a server randomly | Surprisingly effective at scale |
Round Robin is the simplest and works great when all servers are identical and requests take roughly the same time.
Least Connections is better when some requests take much longer than others (like file uploads vs simple GETs).
IP Hash ensures the same client always hits the same server. Useful when we have session data on the server (though we should prefer stateless servers + external session stores).
Health Checks
Load balancers periodically ping each server to check if it’s alive. If a server stops responding, the load balancer removes it from the rotation. When it comes back, it gets added again.
LB → GET /health → Server 1: 200 OK ✓ (keep in rotation)
LB → GET /health → Server 2: timeout ✗ (remove from rotation)
LB → GET /health → Server 3: 200 OK ✓ (keep in rotation)
Load Balancer Redundancy
Wait — if the load balancer is a single point of failure, what do we do? We use two load balancers in an active-passive setup:
- Active LB handles all traffic
- Passive LB monitors the active one
- If active goes down, passive takes over (using a floating IP or DNS failover)
Some setups use active-active, where both LBs handle traffic simultaneously.
Popular Load Balancers
- NGINX — Software LB, very popular, can also be a reverse proxy and web server
- HAProxy — High-performance software LB, used by GitHub and Stack Overflow
- AWS ALB/NLB — Managed cloud LBs (ALB = L7, NLB = L4)
- Caddy — Modern LB with automatic HTTPS
In simple language, a load balancer is like a traffic cop for our servers. It spreads the work evenly, routes around failures, and lets us add more servers whenever we need more capacity. It’s one of the first things we add when scaling beyond a single server.