We just spent an entire note learning how to normalize — remove redundancy, split tables, eliminate anomalies. And now we’re going to talk about intentionally adding redundancy back in. Welcome to denormalization.
In simple language: denormalization is the trade-off of faster reads at the cost of slower writes and more storage. We duplicate data so we don’t have to JOIN five tables for a simple query.
Why Denormalize?
Normalization is great for data integrity. But it creates a lot of tables, which means a lot of JOINs. JOINs are expensive. When our app needs to serve a dashboard that joins 8 tables with millions of rows, normalization can kill performance.
Denormalization helps when:
- Read performance matters more than write performance (most web apps)
- Query complexity is getting out of hand (too many JOINs)
- Analytics/reporting needs fast aggregations
- The data doesn’t change often (or we can tolerate slightly stale data)
The key insight: normalize first, then denormalize strategically where needed. Never start denormalized — we’ll end up with a mess.
Common Denormalization Patterns
1. Duplicated Fields
Store a frequently accessed value from a related table directly in the current table.
-- Normalized: need to JOIN to get the department name
SELECT o.id, o.total, c.name AS customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- Denormalized: customer name stored directly on the order
ALTER TABLE orders ADD COLUMN customer_name VARCHAR(100);
-- Now we can query without a JOIN
SELECT id, total, customer_name FROM orders;
-- Trade-off: if the customer changes their name, we need to
-- update it in both the customers table AND every order
2. Precomputed/Cached Counters
Instead of running COUNT(*) every time, maintain a counter column.
-- Normalized: count posts every time (slow on large tables)
SELECT u.name, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.name;
-- Denormalized: maintain a counter
ALTER TABLE users ADD COLUMN post_count INT DEFAULT 0;
-- Increment on new post
UPDATE users SET post_count = post_count + 1 WHERE id = :user_id;
-- Decrement on deleted post
UPDATE users SET post_count = post_count - 1 WHERE id = :user_id;
-- Now the count is instant — no JOIN, no aggregation
SELECT name, post_count FROM users WHERE id = 1;
This is extremely common. Think of follower counts on social media — nobody runs COUNT(*) on a 100M row followers table for every profile view.
3. Summary Tables
Pre-aggregate data into a separate table for reporting.
-- Instead of computing daily revenue from millions of orders each time:
CREATE TABLE daily_revenue (
date DATE PRIMARY KEY,
total_orders INT,
total_revenue DECIMAL(12,2),
avg_order_value DECIMAL(10,2)
);
-- Populate it nightly (or in real-time with triggers)
INSERT INTO daily_revenue (date, total_orders, total_revenue, avg_order_value)
SELECT
DATE(ordered_at),
COUNT(*),
SUM(total),
AVG(total)
FROM orders
WHERE DATE(ordered_at) = CURRENT_DATE - 1
GROUP BY DATE(ordered_at);
4. Computed Columns
Store a value that could be derived, to avoid computing it every time.
-- Instead of computing full_name every time:
ALTER TABLE employees ADD COLUMN full_name VARCHAR(200);
-- Or instead of computing age from birth_date:
-- (This is a bad example actually — age changes daily,
-- so we'd just compute it. But for things like total_price
-- on an order line item, it makes sense.)
ALTER TABLE order_items ADD COLUMN total_price DECIMAL(10,2);
-- total_price = quantity * unit_price
-- Computed once at insert time, read millions of times
5. Embedding/Flattening Relationships
Instead of a junction table, store related data directly (common in NoSQL but applicable to SQL too).
-- Instead of: orders → order_items → products (3 tables, 2 JOINs)
-- Store order items as JSON:
ALTER TABLE orders ADD COLUMN items JSONB;
-- Insert:
-- UPDATE orders SET items = '[
-- {"product_id": 1, "name": "Widget", "qty": 2, "price": 9.99},
-- {"product_id": 3, "name": "Gadget", "qty": 1, "price": 24.99}
-- ]' WHERE id = 1;
-- Query without JOINs:
SELECT id, total, items FROM orders WHERE id = 1;
The Trade-offs
When Denormalization Makes Sense
- Read-heavy workloads — 99% reads, 1% writes (most web apps)
- Analytics and reporting — dashboards that aggregate millions of rows
- Caching layers — materialized views and summary tables act as “database-level caches”
- NoSQL databases — document stores are denormalized by design (embed related data)
- Search/listing pages — product listings, search results, feeds
When It Doesn’t Make Sense
- Write-heavy workloads — every write needs to update multiple places
- Frequently changing data — keeping denormalized copies in sync becomes a nightmare
- Small datasets — JOINs on small tables are fast. Don’t optimize prematurely.
- Data integrity is critical — banking, healthcare, anything where inconsistency is unacceptable
How to Keep Denormalized Data in Sync
- Triggers — automatically update denormalized fields when source data changes
- Application logic — update both the source and the copy in the same transaction
- Background jobs — periodically refresh summary tables (materialized views)
- Event-driven — publish change events, subscribers update denormalized copies
Interview Tip
Interviewers love asking “when would you denormalize?” The answer is always: “when read performance is more important than write simplicity, and we’ve already normalized properly.” Give a concrete example like cached counters (follower count, like count) or summary tables for dashboards. Show that we understand it’s a deliberate trade-off, not laziness.