Congestion Control (Slow Start, AIMD)

intermediate tcp congestion-control slow-start aimd cubic bbr

Congestion control is how TCP avoids overloading the network itself — the routers, links, and queues between sender and receiver. Even if the receiver has plenty of buffer, the path in between might not.

In simple language: TCP slowly probes how fast it can go, backs off when there’s loss, and tries again. It does this without anyone telling it the network capacity.

Two Different Window Limits

The sender’s actual sending rate is bounded by the smaller of:

  • rwnd (receive window) — flow control, protects the receiver.
  • cwnd (congestion window) — congestion control, protects the network.
in_flight ≤ min(rwnd, cwnd)

The Phases

Classic TCP (Reno-style) has four phases:

  1. Slow Start — exponential ramp-up.
  2. Congestion Avoidance — linear growth (AIMD).
  3. Fast Retransmit — resend on 3 duplicate ACKs.
  4. Fast Recovery — recover quickly after fast retransmit.

Phase 1 — Slow Start

Despite the name, slow start is fast. We start with cwnd = 1 MSS (or 10, in modern Linux thanks to RFC 6928). On every ACK, cwnd grows by 1 MSS.

RTT 1:  cwnd = 1
RTT 2:  cwnd = 2     (doubled — got 1 ACK -> +1)
RTT 3:  cwnd = 4     (doubled again)
RTT 4:  cwnd = 8
RTT 5:  cwnd = 16
...

Effectively, cwnd doubles every RTT. We blast forward until we hit a threshold or see a loss.

Phase 2 — Congestion Avoidance (AIMD)

When cwnd reaches ssthresh (slow-start threshold), we shift to AIMD — Additive Increase, Multiplicative Decrease:

  • Additive Increase: cwnd grows by 1 MSS per RTT (much slower).
  • Multiplicative Decrease: on loss, cwnd is halved.
no loss   ->  cwnd += 1 per RTT     (linear)
loss      ->  cwnd /= 2             (cliff)

This sawtooth pattern is the signature look of TCP’s bandwidth graph.

Phase 3 — Fast Retransmit

We don’t always wait for the timer. Three duplicate ACKs are a strong signal that one segment was lost (but later ones got through). Sender immediately resends the missing segment instead of waiting for RTO.

Phase 4 — Fast Recovery

After fast retransmit, instead of going back to slow start (cwnd=1), we set:

ssthresh = cwnd / 2
cwnd     = ssthresh

…and resume in congestion avoidance. We lost a packet, not the whole network. No need to start from zero.

The Sawtooth

cwnd over time (Reno)
cwnd
  │              ╱│           ╱│        ╱│
  │            ╱  │         ╱  │      ╱  │
  │          ╱    │       ╱    │    ╱    │
  │        ╱      │     ╱      │  ╱      │
  │      ╱        ▼   ╱        ▼ ╱       ▼
  │    ╱                                       (slow start)
  │  ╱
  │╱
  └─────────────────────────────────▶  time
       slow │  congestion avoidance (AIMD)
       start│  loss → halve → grow linearly
    

The slow-start phase is the steep ramp at the start. After the first loss, we drop to half and keep doing the linear-up + halve-on-loss dance forever.

cwnd vs rwnd

  • cwnd is internal to the sender. The receiver doesn’t know it.
  • rwnd is advertised by the receiver in every ACK.
  • The sender takes the minimum.

A high-bandwidth, low-loss network often has cwnd as the bottleneck early, and rwnd later (once cwnd > BDP).

Algorithm Variants

  • Reno / NewReno — the classic AIMD algorithm. RFC 5681.
  • CUBIC — Linux default since ~2006. Cwnd grows as a cubic function of time since the last loss — fast recovery to the prior peak, then careful exploration above it. Better for high-bandwidth long-RTT links.
  • BBR (Bottleneck Bandwidth and RTT) — Google, ~2016. Doesn’t use loss as a signal at all. Models the path’s bandwidth and RTT and paces sending. Often beats CUBIC on lossy or buffered networks.
  • Vegas — uses RTT increase (queue buildup) as a signal before loss happens. Niche.
# Linux — see and change the algo
sysctl net.ipv4.tcp_congestion_control
# net.ipv4.tcp_congestion_control = cubic

sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

ECN — Don’t Wait For Loss

Explicit Congestion Notification: routers can mark packets as “I’m congested” instead of dropping them. Sender reacts the same as if there was a loss but no actual data was lost. Requires ECN-aware routers and endpoints.

Common Gotcha

People conflate “TCP is reliable” with “TCP is fast.” In high-loss networks, TCP can crawl because every loss halves cwnd. UDP-based protocols like QUIC + BBR are often faster in those conditions, even though they have to redo reliability themselves.

Interview Tip

The big four to remember: slow start, congestion avoidance, fast retransmit, fast recovery. Plus AIMD = additive increase, multiplicative decrease. If you can sketch the sawtooth on a whiteboard, you’ve answered 90% of TCP-congestion interview questions.