Back-of-the-Envelope Estimation

beginner 2-4 YOE system-design estimation QPS storage bandwidth

Back-of-the-envelope estimation is quick, rough math to figure out the scale of our system. We’re not trying to be exact — we just need to know if we’re dealing with thousands or billions, megabytes or petabytes. That changes everything about our design.

Why Estimation Matters

If our system gets 10 requests per second, a single server is fine. If it gets 100,000 requests per second, we need load balancers, caching, sharding, and a whole different architecture. Estimation tells us which world we’re in.

Numbers Every Engineer Should Know

WhatHow Much
L1 cache reference0.5 ns
L2 cache reference7 ns
RAM reference100 ns
SSD random read150 μs
HDD seek10 ms
Round trip within same datacenter0.5 ms
Round trip CA to Netherlands150 ms

And for storage:

UnitBytesPractical Example
1 KB1,000A short email
1 MB1,000,000A high-res photo
1 GB1,000,000,000A movie
1 TB10^121,000 movies
1 PB10^151 million movies

Handy shortcut: There are about 86,400 seconds in a day. For quick math, round to ~100,000 (10^5). A month is about 2.5 million seconds.

The Four Key Calculations

1. QPS (Queries Per Second)

QPS = Daily Active Users × Queries per User / 86,400

Peak QPS = QPS × 2  (or ×3 for spiky traffic)

Example: 10 million DAU, each user makes 5 requests/day.

QPS = 10M × 5 / 86,400 ≈ 580 QPS
Peak QPS ≈ 1,160 QPS

2. Storage

Storage = Daily New Records × Record Size × Retention Period

Example: 100M new URLs per day, each URL record is 500 bytes, keep for 5 years.

Daily  = 100M × 500 bytes = 50 GB/day
Yearly = 50 GB × 365 ≈ 18 TB/year
5 years = ~90 TB total

3. Bandwidth

Incoming = QPS × Request Size
Outgoing = QPS × Response Size

4. Memory for Cache

We usually cache the hot data — the most frequently accessed items. A common rule of thumb is to cache 20% of daily requests (the 80/20 rule: 20% of data handles 80% of traffic).

Cache Memory = Daily Requests × 0.2 × Average Response Size

Full Example: Estimate Twitter’s Storage

Let’s say Twitter has:

  • 300M monthly active users, 50% are daily → 150M DAU
  • Each user posts 2 tweets/day on average
  • Each tweet: 140 chars (~280 bytes) + metadata (~200 bytes) = ~500 bytes
  • 10% of tweets have a photo (~500 KB average)

Tweet storage per day:

Text:  150M × 2 × 500 bytes = 150 GB/day
Photos: 150M × 2 × 0.10 × 500 KB = 15 TB/day

Per year:

Text:   ~55 TB/year
Photos: ~5.5 PB/year

That tells us we need a serious storage strategy — object storage (like S3) for media, and sharded databases for tweet metadata.

Powers of Two — Quick Reference

PowerValueSize
2^101,024~1 Thousand (1 KB)
2^20~1 Million~1 MB
2^30~1 Billion~1 GB
2^40~1 Trillion~1 TB

Tips for the Interview

  1. State assumptions clearly. “I’m assuming 100M DAU” — the interviewer can correct us.
  2. Round aggressively. Use 10^5 instead of 86,400. Nobody expects exact math.
  3. Focus on order of magnitude. The difference between 50 TB and 90 TB doesn’t change our design. The difference between 50 GB and 50 TB does.
  4. Don’t spend more than 5 minutes. Estimation supports the design, it’s not the main event.

In simple language, back-of-the-envelope estimation is about getting a feel for the scale. Are we building a bicycle or a spaceship? The math takes 5 minutes but saves us from designing something wildly over- or under-engineered.