Denormalization

intermediate denormalization performance schema-design redundancy analytics

We just spent an entire note learning how to normalize — remove redundancy, split tables, eliminate anomalies. And now we’re going to talk about intentionally adding redundancy back in. Welcome to denormalization.

In simple language: denormalization is the trade-off of faster reads at the cost of slower writes and more storage. We duplicate data so we don’t have to JOIN five tables for a simple query.

Why Denormalize?

Normalization is great for data integrity. But it creates a lot of tables, which means a lot of JOINs. JOINs are expensive. When our app needs to serve a dashboard that joins 8 tables with millions of rows, normalization can kill performance.

Denormalization helps when:

  • Read performance matters more than write performance (most web apps)
  • Query complexity is getting out of hand (too many JOINs)
  • Analytics/reporting needs fast aggregations
  • The data doesn’t change often (or we can tolerate slightly stale data)

The key insight: normalize first, then denormalize strategically where needed. Never start denormalized — we’ll end up with a mess.

Common Denormalization Patterns

1. Duplicated Fields

Store a frequently accessed value from a related table directly in the current table.

-- Normalized: need to JOIN to get the department name
SELECT o.id, o.total, c.name AS customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id;

-- Denormalized: customer name stored directly on the order
ALTER TABLE orders ADD COLUMN customer_name VARCHAR(100);

-- Now we can query without a JOIN
SELECT id, total, customer_name FROM orders;

-- Trade-off: if the customer changes their name, we need to
-- update it in both the customers table AND every order

2. Precomputed/Cached Counters

Instead of running COUNT(*) every time, maintain a counter column.

-- Normalized: count posts every time (slow on large tables)
SELECT u.name, COUNT(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.name;

-- Denormalized: maintain a counter
ALTER TABLE users ADD COLUMN post_count INT DEFAULT 0;

-- Increment on new post
UPDATE users SET post_count = post_count + 1 WHERE id = :user_id;

-- Decrement on deleted post
UPDATE users SET post_count = post_count - 1 WHERE id = :user_id;

-- Now the count is instant — no JOIN, no aggregation
SELECT name, post_count FROM users WHERE id = 1;

This is extremely common. Think of follower counts on social media — nobody runs COUNT(*) on a 100M row followers table for every profile view.

3. Summary Tables

Pre-aggregate data into a separate table for reporting.

-- Instead of computing daily revenue from millions of orders each time:
CREATE TABLE daily_revenue (
    date DATE PRIMARY KEY,
    total_orders INT,
    total_revenue DECIMAL(12,2),
    avg_order_value DECIMAL(10,2)
);

-- Populate it nightly (or in real-time with triggers)
INSERT INTO daily_revenue (date, total_orders, total_revenue, avg_order_value)
SELECT
    DATE(ordered_at),
    COUNT(*),
    SUM(total),
    AVG(total)
FROM orders
WHERE DATE(ordered_at) = CURRENT_DATE - 1
GROUP BY DATE(ordered_at);

4. Computed Columns

Store a value that could be derived, to avoid computing it every time.

-- Instead of computing full_name every time:
ALTER TABLE employees ADD COLUMN full_name VARCHAR(200);

-- Or instead of computing age from birth_date:
-- (This is a bad example actually — age changes daily,
-- so we'd just compute it. But for things like total_price
-- on an order line item, it makes sense.)

ALTER TABLE order_items ADD COLUMN total_price DECIMAL(10,2);
-- total_price = quantity * unit_price
-- Computed once at insert time, read millions of times

5. Embedding/Flattening Relationships

Instead of a junction table, store related data directly (common in NoSQL but applicable to SQL too).

-- Instead of: orders → order_items → products (3 tables, 2 JOINs)
-- Store order items as JSON:
ALTER TABLE orders ADD COLUMN items JSONB;

-- Insert:
-- UPDATE orders SET items = '[
--   {"product_id": 1, "name": "Widget", "qty": 2, "price": 9.99},
--   {"product_id": 3, "name": "Gadget", "qty": 1, "price": 24.99}
-- ]' WHERE id = 1;

-- Query without JOINs:
SELECT id, total, items FROM orders WHERE id = 1;

The Trade-offs

Benefits
Costs
Faster reads (fewer JOINs)
Slower writes (update multiple places)
Simpler queries
Data inconsistency risk
Great for dashboards/reports
More storage used
Reduces database load
More complex app logic to keep in sync

When Denormalization Makes Sense

  • Read-heavy workloads — 99% reads, 1% writes (most web apps)
  • Analytics and reporting — dashboards that aggregate millions of rows
  • Caching layers — materialized views and summary tables act as “database-level caches”
  • NoSQL databases — document stores are denormalized by design (embed related data)
  • Search/listing pages — product listings, search results, feeds

When It Doesn’t Make Sense

  • Write-heavy workloads — every write needs to update multiple places
  • Frequently changing data — keeping denormalized copies in sync becomes a nightmare
  • Small datasets — JOINs on small tables are fast. Don’t optimize prematurely.
  • Data integrity is critical — banking, healthcare, anything where inconsistency is unacceptable

How to Keep Denormalized Data in Sync

  1. Triggers — automatically update denormalized fields when source data changes
  2. Application logic — update both the source and the copy in the same transaction
  3. Background jobs — periodically refresh summary tables (materialized views)
  4. Event-driven — publish change events, subscribers update denormalized copies

Interview Tip

Interviewers love asking “when would you denormalize?” The answer is always: “when read performance is more important than write simplicity, and we’ve already normalized properly.” Give a concrete example like cached counters (follower count, like count) or summary tables for dashboards. Show that we understand it’s a deliberate trade-off, not laziness.