Caching - High-Level Design

Caching is storing a copy of data in a faster location so we don’t have to fetch it from the slower source every time. Think of it like keeping a sticky note on our desk instead of walking to the filing cabinet every time we need a phone number.

It’s probably the single most impactful thing we can do for performance.

Where to Cache

Caching Layers (fastest → slowest)

Layer 1: Browser Cache

└── Static assets (JS, CSS, images). No network call at all.

Layer 2: CDN Cache

└── Cached at edge servers closest to the user.

Layer 3: Application Cache (Redis / Memcached)

└── Hot data in memory. Way faster than hitting the DB.

Layer 4: Database Cache

└── Query cache, buffer pool. The DB's own internal caching.

The closer the cache is to the user, the faster the response. A browser cache hit is instant. A CDN hit avoids the trip to our data center. A Redis hit avoids the slow database query.

Cache Hit vs Cache Miss

Cache hit — The data is in the cache. We return it immediately. Fast.
Cache miss — The data is NOT in the cache. We fetch from the source, store it in the cache, then return it. Slow (but next time it’ll be a hit).

The hit ratio tells us how effective our cache is. If 95% of requests are cache hits, our cache is doing great. Below 80%, we should rethink our strategy.

Cache Eviction Policies

The cache has limited memory. When it’s full and new data comes in, we have to kick something out. The question is: what do we evict?

Policy	How It Works	Use When
LRU (Least Recently Used)	Evict the item not accessed for the longest time	General purpose — most common choice
LFU (Least Frequently Used)	Evict the item accessed the fewest times	Hot items should stay (e.g., trending content)
FIFO (First In, First Out)	Evict the oldest item	Simple, order-based access patterns
TTL (Time To Live)	Items expire after a fixed time	Data that goes stale (API responses, sessions)

LRU is the default choice in most system design interviews. It’s simple and works well for most access patterns.

Cache Invalidation Strategies

The hardest problem with caching: keeping the cache in sync with the database. If the database changes but the cache still has the old value, users see stale data.

Cache-Aside (Lazy Loading)

The most common pattern. The application manages the cache directly:

Read: Check cache first → miss → read from DB → write to cache → return
Write: Write to DB → delete from cache (next read will re-populate it)

Pros: Only caches what’s actually requested. Cache failure doesn’t break the system. Cons: First request after a miss is slow. Potential for stale data between DB write and cache delete.

Write-Through

Every write goes to the cache AND the database at the same time.

Pros: Cache is always up to date. No stale data. Cons: Higher write latency (two writes per operation). Cache may fill with data that’s never read.

Write-Behind (Write-Back)

Write to the cache first, then asynchronously write to the database later.

Pros: Super fast writes. Great for write-heavy workloads. Cons: Risk of data loss if the cache crashes before persisting to DB.

When NOT to Cache

Caching isn’t always the answer:

Frequently changing data — If data changes every second, the cache is constantly stale.
Low-traffic data — If it’s rarely accessed, the cache miss rate is high and we’re wasting memory.
Write-heavy workloads — More writes than reads means the cache is constantly being invalidated.
Data that must be perfectly consistent — Like account balances. Stale cache = real problems.

Popular Caching Tools

Redis — In-memory key-value store. Supports data structures (lists, sets, sorted sets). Most popular choice.
Memcached — Simpler than Redis, pure key-value. Slightly faster for simple caching.
Varnish — HTTP reverse proxy cache. Great for caching entire HTTP responses.

Cache in System Design Interviews

When an interviewer asks “how would you improve performance?”, caching is almost always part of the answer. Common cache use cases:

Cache database query results to reduce DB load
Cache user session data for fast authentication
Cache API responses from third-party services
Cache computed results (like feed generation or recommendations)

In simple language, caching trades memory for speed. We store a copy of frequently accessed data in a fast place so we don’t keep hammering the slow place. It’s the difference between a 2ms response (Redis) and a 200ms response (database query with joins).