Denormalization and Read-Write Separation - High-Level Design

In a normalized database, we avoid duplicate data. That’s great for writes and data integrity. But when our app grows and reads start dominating (most apps are 90% reads, 10% writes), all those JOINs across perfectly normalized tables start crushing performance. That’s when denormalization and read-write separation come in.

What Is Denormalization?

Denormalization is intentionally adding redundant data to our database to speed up reads. We’re trading storage space and write complexity for faster queries.

In a normalized database, we’d store a user’s name in the users table only. If we need it on a post, we JOIN. In a denormalized database, we store the user’s name directly on the post too.

Normalization vs Denormalization

Aspect	Normalized	Denormalized
Data duplication	None	Yes, by design
Read performance	Slower (JOINs)	Faster (fewer/no JOINs)
Write performance	Faster (update one place)	Slower (update multiple places)
Data consistency	Easy (single source of truth)	Harder (must sync copies)
Storage	Less	More
Best for	Write-heavy, small scale	Read-heavy, large scale

Example: Normalizing a User Feed

Say we’re building a social feed. Normalized approach:

-- Normalized: 3 tables, need JOINs to display a post
SELECT p.content, p.created_at, u.name, u.avatar_url,
       COUNT(l.id) as like_count
FROM posts p
JOIN users u ON p.user_id = u.id
LEFT JOIN likes l ON l.post_id = p.id
WHERE p.id = 123
GROUP BY p.id, u.id;

This works fine for 1,000 users. At 10 million users with millions of posts, this JOIN is painful. Denormalized approach:

-- Denormalized: everything we need is right on the posts table
SELECT content, created_at, author_name, author_avatar, like_count
FROM posts
WHERE id = 123;

One table, zero JOINs, blazing fast. The only difference is we duplicated author_name, author_avatar, and pre-computed like_count onto the posts table.

The trade-off? When a user changes their name, we need to update it on all their posts too. That’s the price of denormalization.

When to Denormalize

Denormalization makes sense when:

Read queries are way more frequent than writes
JOINs are becoming the bottleneck (check query plans)
We need very low latency reads (real-time feeds, dashboards)
The duplicated data changes infrequently (user names don’t change every minute)

Don’t denormalize when:

Our data changes frequently and consistency is critical (bank accounts)
We’re still small — premature optimization is the root of all evil
We haven’t profiled our queries to confirm JOINs are the actual bottleneck

Read-Write Separation (CQRS Lite)

The idea is simple: use separate database instances for reads and writes.

The primary (master) database handles all writes. One or more read replicas handle all reads. The primary replicates data to the replicas asynchronously.

Writes → Primary DB ──replication──→ Read Replica 1
                                  ──→ Read Replica 2
                                  ──→ Read Replica 3
Reads  → Load Balancer → [Replica 1, Replica 2, Replica 3]

This works because most apps are read-heavy. If 90% of our queries are reads, we just offload them to replicas and our primary DB breathes again.

Setting It Up

Most managed databases make this easy:

AWS RDS — Click “Create read replica” and we’re done
PostgreSQL — Streaming replication to standby servers
MySQL — Built-in master-slave replication

Our application code needs to know which database to talk to:

# Pseudocode
def get_user(user_id):
    db = read_replica()  # reads go to replica
    return db.query("SELECT * FROM users WHERE id = %s", user_id)

def update_user(user_id, data):
    db = primary()  # writes go to primary
    db.execute("UPDATE users SET name = %s WHERE id = %s", data.name, user_id)

The Replication Lag Problem

Here’s the catch: replication isn’t instant. There’s a small delay (usually milliseconds, sometimes seconds) between a write hitting the primary and showing up on replicas.

This means: a user updates their profile, refreshes the page, and sees the old data because the read hit a replica that hasn’t caught up yet.

Solutions:

Read-your-own-writes — After a write, route that user’s reads to the primary for a few seconds
Sticky sessions — Route a user to the same replica consistently
Accept it — For many use cases (feeds, dashboards), slight staleness is fine

Full CQRS (The Advanced Version)

CQRS (Command Query Responsibility Segregation) takes read-write separation further. Instead of just separate DB instances, we use entirely different data models for reads and writes.

Write model — Normalized, optimized for data integrity
Read model — Denormalized, optimized for query patterns (maybe even a different database like Elasticsearch)

Events sync data from the write model to the read model. This is powerful but complex — we only need full CQRS for systems with very different read and write patterns (think: analytics dashboards reading from a normalized transactional DB).

For most systems, simple read replicas are more than enough.

Key Takeaway

In simple language, denormalization is copying data so we read faster but write slower. Read-write separation is splitting our database so reads don’t compete with writes. Both are tools for scaling read-heavy systems. Start normalized, measure our bottlenecks, then denormalize strategically where it hurts most.