In a normalized database, we avoid duplicate data. That’s great for writes and data integrity. But when our app grows and reads start dominating (most apps are 90% reads, 10% writes), all those JOINs across perfectly normalized tables start crushing performance. That’s when denormalization and read-write separation come in.
What Is Denormalization?
Denormalization is intentionally adding redundant data to our database to speed up reads. We’re trading storage space and write complexity for faster queries.
In a normalized database, we’d store a user’s name in the users table only. If we need it on a post, we JOIN. In a denormalized database, we store the user’s name directly on the post too.
Normalization vs Denormalization
| Aspect | Normalized | Denormalized |
|---|---|---|
| Data duplication | None | Yes, by design |
| Read performance | Slower (JOINs) | Faster (fewer/no JOINs) |
| Write performance | Faster (update one place) | Slower (update multiple places) |
| Data consistency | Easy (single source of truth) | Harder (must sync copies) |
| Storage | Less | More |
| Best for | Write-heavy, small scale | Read-heavy, large scale |
Example: Normalizing a User Feed
Say we’re building a social feed. Normalized approach:
-- Normalized: 3 tables, need JOINs to display a post
SELECT p.content, p.created_at, u.name, u.avatar_url,
COUNT(l.id) as like_count
FROM posts p
JOIN users u ON p.user_id = u.id
LEFT JOIN likes l ON l.post_id = p.id
WHERE p.id = 123
GROUP BY p.id, u.id;
This works fine for 1,000 users. At 10 million users with millions of posts, this JOIN is painful. Denormalized approach:
-- Denormalized: everything we need is right on the posts table
SELECT content, created_at, author_name, author_avatar, like_count
FROM posts
WHERE id = 123;
One table, zero JOINs, blazing fast. The only difference is we duplicated author_name, author_avatar, and pre-computed like_count onto the posts table.
The trade-off? When a user changes their name, we need to update it on all their posts too. That’s the price of denormalization.
When to Denormalize
Denormalization makes sense when:
- Read queries are way more frequent than writes
- JOINs are becoming the bottleneck (check query plans)
- We need very low latency reads (real-time feeds, dashboards)
- The duplicated data changes infrequently (user names don’t change every minute)
Don’t denormalize when:
- Our data changes frequently and consistency is critical (bank accounts)
- We’re still small — premature optimization is the root of all evil
- We haven’t profiled our queries to confirm JOINs are the actual bottleneck
Read-Write Separation (CQRS Lite)
The idea is simple: use separate database instances for reads and writes.
The primary (master) database handles all writes. One or more read replicas handle all reads. The primary replicates data to the replicas asynchronously.
Writes → Primary DB ──replication──→ Read Replica 1
──→ Read Replica 2
──→ Read Replica 3
Reads → Load Balancer → [Replica 1, Replica 2, Replica 3]
This works because most apps are read-heavy. If 90% of our queries are reads, we just offload them to replicas and our primary DB breathes again.
Setting It Up
Most managed databases make this easy:
- AWS RDS — Click “Create read replica” and we’re done
- PostgreSQL — Streaming replication to standby servers
- MySQL — Built-in master-slave replication
Our application code needs to know which database to talk to:
# Pseudocode
def get_user(user_id):
db = read_replica() # reads go to replica
return db.query("SELECT * FROM users WHERE id = %s", user_id)
def update_user(user_id, data):
db = primary() # writes go to primary
db.execute("UPDATE users SET name = %s WHERE id = %s", data.name, user_id)
The Replication Lag Problem
Here’s the catch: replication isn’t instant. There’s a small delay (usually milliseconds, sometimes seconds) between a write hitting the primary and showing up on replicas.
This means: a user updates their profile, refreshes the page, and sees the old data because the read hit a replica that hasn’t caught up yet.
Solutions:
- Read-your-own-writes — After a write, route that user’s reads to the primary for a few seconds
- Sticky sessions — Route a user to the same replica consistently
- Accept it — For many use cases (feeds, dashboards), slight staleness is fine
Full CQRS (The Advanced Version)
CQRS (Command Query Responsibility Segregation) takes read-write separation further. Instead of just separate DB instances, we use entirely different data models for reads and writes.
- Write model — Normalized, optimized for data integrity
- Read model — Denormalized, optimized for query patterns (maybe even a different database like Elasticsearch)
Events sync data from the write model to the read model. This is powerful but complex — we only need full CQRS for systems with very different read and write patterns (think: analytics dashboards reading from a normalized transactional DB).
For most systems, simple read replicas are more than enough.
Key Takeaway
In simple language, denormalization is copying data so we read faster but write slower. Read-write separation is splitting our database so reads don’t compete with writes. Both are tools for scaling read-heavy systems. Start normalized, measure our bottlenecks, then denormalize strategically where it hurts most.