Latency, bandwidth, and throughput sound similar. They’re not. Mixing them up is the root cause of half the “why is my app slow” debates.
Definitions
- Latency — how long it takes for one packet to travel from A to B. Measured in milliseconds.
- Bandwidth — the theoretical maximum capacity of the link. Measured in bits per second.
- Throughput — the actual rate of data we observe end to end. Always ≤ bandwidth.
The Highway Analogy
Imagine a highway between two cities.
- Bandwidth = the number of lanes. More lanes = more cars can travel in parallel.
- Latency = how long it takes one car to drive from city A to city B. Fixed by speed and distance.
- Throughput = how many cars actually arrive per minute. Depends on lanes, speed limit, traffic jams, accidents.
A 16-lane highway (high bandwidth) with a 5-hour drive (high latency) still delivers a lot of cars per hour because the lanes are wide. But if all the lanes are jammed, throughput drops to a trickle.
Why Low Bandwidth + Low Latency ≠ High Throughput
Throughput is bounded by both plus the protocol overhead. TCP, for example, requires acknowledgments. The sender can only send a “window” of unacknowledged data, then must wait for an ACK before sending more.
That window divided by the round-trip time gives us the maximum effective throughput, no matter how big the pipe is.
Throughput ~= Window Size / RTT
If we have 1 Gbps of bandwidth but 200 ms RTT and a 64 KB window, throughput is capped around:
64 KB / 0.2 s = 320 KB/s ~= 2.5 Mbps
Even though the link can do 1000 Mbps. This is the long fat pipe problem.
Bandwidth-Delay Product (BDP)
BDP = bandwidth × RTT. It tells us how much data is “in flight” at any moment.
BDP = 1 Gbps * 100 ms = 1e9 bits/s * 0.1 s = 1e8 bits = 12.5 MB
If our TCP window is smaller than BDP, we can’t fill the pipe. Modern TCP uses window scaling (RFC 7323) to grow the window beyond the original 64 KB cap.
RTT — Round-Trip Time
The time for a packet to go to a host and come back. Measured by ping.
# Measure RTT to a server
ping -c 5 google.com
# Continuous monitoring with jitter
mtr google.com
Typical RTTs:
- Same datacenter: < 1 ms
- Same city: 5-10 ms
- Cross-country (US): 50-80 ms
- Cross-continent: 100-200 ms
- Satellite (GEO): 600+ ms
Why Latency Matters Even When Bandwidth Is Huge
Loading a page often means dozens of small HTTP requests in serial — DNS, TLS handshake, fetch HTML, fetch CSS, fetch JS, etc. Each one pays the RTT cost.
A 100 ms RTT × 10 sequential requests = 1 second of pure waiting, regardless of bandwidth. That’s why CDNs (low latency) often matter more than upgrading the connection (high bandwidth).
This is also why HTTP/2 multiplexing and HTTP/3 (over QUIC) save time — they parallelize the round trips.
Latency Hierarchy (Approximate)
L1 cache 0.5 ns
L2 cache 5 ns
RAM 100 ns
SSD read 150 us
HDD seek 10 ms
Network LAN 0.5 ms
Network WAN 50-150 ms
Mobile (4G/5G) 30-50 ms
Notice the 5-7 orders of magnitude between RAM and a typical WAN round trip. That’s why caching matters so much.
Measuring Throughput
# iperf3 - the standard tool for measuring real throughput
# On the server
iperf3 -s
# On the client
iperf3 -c server.example.com -t 30
iperf3 reports actual achievable throughput. Compare with the link’s advertised bandwidth — the gap is overhead, congestion, and protocol limits.
Interview Tip
If asked “would you rather have 10x bandwidth or 1/10 latency?” — for most user-facing workloads (web pages, APIs, gaming), lower latency wins. Bandwidth helps with bulk transfers (video streaming, backups). Always tie the answer to the workload.