Load Balancing

intermediate load-balancer networking high-availability

When one server can’t handle all the traffic, we put multiple servers behind a load balancer. It distributes incoming requests across those servers so no single one gets overwhelmed. This gives us both scalability (handle more traffic) and high availability (if one server dies, others keep serving).

L4 vs L7 Load Balancing

Load balancers operate at different layers of the network stack. The two we care about are Layer 4 (transport) and Layer 7 (application).

L4 — Transport Layer
Routes based on IP + port
Can't see HTTP headers or URLs
Just forwards TCP/UDP connections
Very fast — minimal processing
Good for: databases, non-HTTP traffic
Decision: "Send this TCP connection to server B"
L7 — Application Layer
Routes based on URL, headers, cookies
Can inspect HTTP request content
Can rewrite URLs, add headers
Slower — needs to parse HTTP
Good for: web apps, API routing
Decision: "Send /api/* to backend, /static/* to CDN"

Most web applications use L7 load balancing because we want to route based on URL paths, headers, or cookies.

Load Balancing Algorithms

How does the load balancer decide which server gets the next request?

  • Round Robin — Takes turns: server 1, server 2, server 3, repeat. Simple and fair, but ignores server load.
  • Weighted Round Robin — Same but some servers get more turns. Server A (weight 3) gets 3x the traffic of server B (weight 1). Useful when servers have different capacities.
  • Least Connections — Sends to the server with the fewest active connections. Smart choice when requests take varying amounts of time.
  • IP Hash — Hashes the client’s IP to pick a server. Same client always goes to the same server. Good for simple session persistence.
  • Random — Pick a server at random. Surprisingly effective at scale.

Nginx Load Balancer Config

Here’s what a basic L7 load balancer looks like with Nginx.

# nginx.conf
upstream backend {
    # Least connections algorithm
    least_conn;

    server 10.0.0.1:3000 weight=3;   # gets 3x traffic
    server 10.0.0.2:3000;             # weight=1 (default)
    server 10.0.0.3:3000 backup;      # only used if others are down
}

server {
    listen 80;
    server_name myapp.com;

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Health Checks

A load balancer needs to know if a server is alive. It periodically sends requests to each server and removes unhealthy ones from the pool.

  • Active health checks — The LB pings each server (e.g., GET /health every 10s). If 3 checks fail, the server is marked down.
  • Passive health checks — The LB watches real traffic. If a server starts returning 5xx errors, it gets pulled out.
# A simple health endpoint in any app
# GET /health → 200 OK means "I'm alive"
# Returns: {"status": "ok", "uptime": 12345}

Sticky Sessions

Sometimes we need the same user to always reach the same server (e.g., if session data is stored in server memory). This is called session affinity or sticky sessions.

Methods:

  • Cookie-based — The LB sets a cookie (like SERVERID=web2) and uses it for routing
  • IP-based — Route based on client IP (breaks with shared IPs / proxies)

The better solution is usually to avoid sticky sessions altogether by using a shared session store like Redis.

Common Load Balancing Tools

ToolTypeNotes
NginxL4/L7Most popular for web, great L7 support
HAProxyL4/L7Battle-tested, amazing performance
CaddyL7Auto HTTPS, simple config
AWS ALBL7Managed, integrates with AWS services
AWS NLBL4Managed, ultra-low latency
GCP Load BalancerL4/L7Global, anycast-based
TraefikL7Auto-discovers containers, great for Docker/K8s

In simple language, a load balancer is like a traffic cop standing in front of our servers. It sends each car (request) down a different road (server) so no single road gets jammed. If a road is closed (server down), it redirects traffic to the open ones.