Load Balancing - DevOps

When one server can’t handle all the traffic, we put multiple servers behind a load balancer. It distributes incoming requests across those servers so no single one gets overwhelmed. This gives us both scalability (handle more traffic) and high availability (if one server dies, others keep serving).

L4 vs L7 Load Balancing

Load balancers operate at different layers of the network stack. The two we care about are Layer 4 (transport) and Layer 7 (application).

L4 — Transport Layer

Routes based on IP + port

Can't see HTTP headers or URLs

Just forwards TCP/UDP connections

Very fast — minimal processing

Good for: databases, non-HTTP traffic

Decision: "Send this TCP connection to server B"

L7 — Application Layer

Routes based on URL, headers, cookies

Can inspect HTTP request content

Can rewrite URLs, add headers

Slower — needs to parse HTTP

Good for: web apps, API routing

Decision: "Send /api/* to backend, /static/* to CDN"

Most web applications use L7 load balancing because we want to route based on URL paths, headers, or cookies.

Load Balancing Algorithms

How does the load balancer decide which server gets the next request?

Round Robin — Takes turns: server 1, server 2, server 3, repeat. Simple and fair, but ignores server load.
Weighted Round Robin — Same but some servers get more turns. Server A (weight 3) gets 3x the traffic of server B (weight 1). Useful when servers have different capacities.
Least Connections — Sends to the server with the fewest active connections. Smart choice when requests take varying amounts of time.
IP Hash — Hashes the client’s IP to pick a server. Same client always goes to the same server. Good for simple session persistence.
Random — Pick a server at random. Surprisingly effective at scale.

Nginx Load Balancer Config

Here’s what a basic L7 load balancer looks like with Nginx.

# nginx.conf
upstream backend {
    # Least connections algorithm
    least_conn;

    server 10.0.0.1:3000 weight=3;   # gets 3x traffic
    server 10.0.0.2:3000;             # weight=1 (default)
    server 10.0.0.3:3000 backup;      # only used if others are down
}

server {
    listen 80;
    server_name myapp.com;

    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Health Checks

A load balancer needs to know if a server is alive. It periodically sends requests to each server and removes unhealthy ones from the pool.

Active health checks — The LB pings each server (e.g., GET /health every 10s). If 3 checks fail, the server is marked down.
Passive health checks — The LB watches real traffic. If a server starts returning 5xx errors, it gets pulled out.

# A simple health endpoint in any app
# GET /health → 200 OK means "I'm alive"
# Returns: {"status": "ok", "uptime": 12345}

Sticky Sessions

Sometimes we need the same user to always reach the same server (e.g., if session data is stored in server memory). This is called session affinity or sticky sessions.

Methods:

Cookie-based — The LB sets a cookie (like SERVERID=web2) and uses it for routing
IP-based — Route based on client IP (breaks with shared IPs / proxies)

The better solution is usually to avoid sticky sessions altogether by using a shared session store like Redis.

Common Load Balancing Tools

Tool	Type	Notes
Nginx	L4/L7	Most popular for web, great L7 support
HAProxy	L4/L7	Battle-tested, amazing performance
Caddy	L7	Auto HTTPS, simple config
AWS ALB	L7	Managed, integrates with AWS services
AWS NLB	L4	Managed, ultra-low latency
GCP Load Balancer	L4/L7	Global, anycast-based
Traefik	L7	Auto-discovers containers, great for Docker/K8s

In simple language, a load balancer is like a traffic cop standing in front of our servers. It sends each car (request) down a different road (server) so no single road gets jammed. If a road is closed (server down), it redirects traffic to the open ones.