Schema design isn’t just about normalization. There are well-known patterns that solve specific problems — analytics, flexibility, deletion, and polymorphism. Let’s walk through the most important ones.
Star Schema
The star schema is the bread and butter of data warehousing and analytics. It has two types of tables:
- Fact table — stores the events/transactions (sales, clicks, orders). Contains foreign keys to dimension tables and numeric measures (amount, quantity, duration).
- Dimension tables — stores the context (who, what, where, when). Contains descriptive attributes.
The fact table sits in the center, surrounded by dimension tables — like a star.
product_id, store_id
-- Fact table: one row per sale
CREATE TABLE fact_sales (
id SERIAL PRIMARY KEY,
date_id INT REFERENCES dim_date(id),
customer_id INT REFERENCES dim_customer(id),
product_id INT REFERENCES dim_product(id),
store_id INT REFERENCES dim_store(id),
quantity INT,
amount DECIMAL(10,2),
discount DECIMAL(5,2)
);
-- Dimension table: date details (precomputed for fast queries)
CREATE TABLE dim_date (
id INT PRIMARY KEY, -- e.g., 20240315 for March 15, 2024
full_date DATE,
year INT,
quarter INT,
month INT,
month_name VARCHAR(20),
day_of_week VARCHAR(10),
is_weekend BOOLEAN
);
-- Queries are fast — just one JOIN per dimension
SELECT
d.year, d.month_name,
p.category,
SUM(f.amount) AS total_revenue
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.id
JOIN dim_product p ON f.product_id = p.id
GROUP BY d.year, d.month_name, p.category;
When to use: Data warehouses, BI dashboards, analytics systems. Denormalized by design for query speed.
Snowflake Schema
A snowflake schema is like a star schema, but the dimension tables are normalized — they reference additional lookup tables.
Instead of dim_product having a category column, it has a category_id that references a separate dim_category table.
-- Star: product has category as a string
-- dim_product: id, name, category, brand
-- Snowflake: product references a category table
CREATE TABLE dim_category (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(100)
);
CREATE TABLE dim_product (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
category_id INT REFERENCES dim_category(id),
brand VARCHAR(100)
);
Star vs Snowflake: Star is simpler and faster (fewer JOINs). Snowflake saves storage and avoids redundancy. Most data warehouses prefer star for its simplicity.
EAV — Entity-Attribute-Value
The EAV pattern stores data as rows of (entity, attribute, value) triples instead of fixed columns. It’s the most flexible schema possible — we can add any attribute to any entity without changing the table structure.
-- Traditional approach: every product type needs different columns
-- Electronics: screen_size, battery, processor
-- Clothing: size, color, material
-- Books: author, isbn, pages
-- Maintaining one table per type (or one wide table) gets messy fast
-- EAV approach: three columns handle everything
CREATE TABLE product_attributes (
id SERIAL PRIMARY KEY,
product_id INT REFERENCES products(id),
attribute_name VARCHAR(100), -- "screen_size", "color", "author"
attribute_value TEXT -- everything stored as text
);
-- Insert attributes for a laptop
INSERT INTO product_attributes (product_id, attribute_name, attribute_value) VALUES
(1, 'screen_size', '15.6'),
(1, 'processor', 'M2'),
(1, 'ram', '16GB');
-- Insert attributes for a t-shirt
INSERT INTO product_attributes (product_id, attribute_name, attribute_value) VALUES
(2, 'size', 'L'),
(2, 'color', 'Blue'),
(2, 'material', 'Cotton');
The downsides of EAV are significant:
- No type safety (everything is a string)
- Can’t add constraints (can’t enforce “screen_size must be a number”)
- Queries become ugly (need to PIVOT to get a row-per-entity view)
- Performance suffers at scale
When to use: Truly dynamic attributes where the schema is unknowable upfront. Product catalogs with wildly different product types. User-defined custom fields. Consider JSONB columns in PostgreSQL as a better modern alternative.
Polymorphic Associations
When multiple tables need to relate to the same table. For example: both posts and comments can have “likes.” Instead of a separate likes table for each, we use one table with a type column.
-- Polymorphic likes table
CREATE TABLE likes (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id),
likeable_type VARCHAR(50) NOT NULL, -- 'post' or 'comment'
likeable_id INT NOT NULL, -- id in the respective table
created_at TIMESTAMP DEFAULT NOW()
);
-- Like a post
INSERT INTO likes (user_id, likeable_type, likeable_id) VALUES (1, 'post', 42);
-- Like a comment
INSERT INTO likes (user_id, likeable_type, likeable_id) VALUES (1, 'comment', 99);
-- Get all likes on a post
SELECT * FROM likes WHERE likeable_type = 'post' AND likeable_id = 42;
The problem: We can’t use a foreign key constraint because likeable_id could reference different tables. Referential integrity relies on application code.
Alternatives:
- Separate tables:
post_likesandcomment_likes(cleaner, FK constraints work) - Shared parent: make both posts and comments inherit from a
contenttable
Soft Deletes
Instead of actually deleting a row with DELETE, we mark it as deleted with a timestamp column. The data stays in the table but is hidden from normal queries.
-- Add a soft delete column
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP DEFAULT NULL;
-- "Delete" a user (soft)
UPDATE users SET deleted_at = NOW() WHERE id = 5;
-- All queries must filter out deleted rows
SELECT * FROM users WHERE deleted_at IS NULL;
-- Create a view for convenience
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;
-- To "undelete" — just clear the timestamp
UPDATE users SET deleted_at = NULL WHERE id = 5;
Benefits:
- Data is recoverable (undo delete)
- Audit trail (when was it deleted?)
- Referential integrity preserved (foreign keys don’t break)
- Legal/compliance requirements (retain data for N years)
Downsides:
- Every query needs the
WHERE deleted_at IS NULLfilter (easy to forget) - Table grows forever (need periodic cleanup)
- Unique constraints get tricky (two “deleted” users can have the same email)
-- Fix the unique constraint issue with a partial unique index
CREATE UNIQUE INDEX idx_users_email_active
ON users (email)
WHERE deleted_at IS NULL;
-- Only enforces uniqueness among non-deleted rows
When to Use Each Pattern
| Pattern | Best For |
|---|---|
| Star Schema | Data warehouses, analytics, BI dashboards |
| Snowflake Schema | When storage matters, highly normalized analytics |
| EAV | User-defined fields, wildly varying attributes |
| Polymorphic | Multiple entity types sharing a feature (likes, comments, tags) |
| Soft Deletes | Audit requirements, undo capability, data retention compliance |
Interview Tip
Schema design patterns come up in system design interviews. When designing a data model, mention the pattern by name — “I’d use a star schema for the analytics layer” or “I’d implement soft deletes here for audit compliance.” It shows we’ve seen real production systems and know the trade-offs of each approach.