Schema Design Patterns

intermediate schema-design star-schema snowflake-schema eav soft-delete polymorphic

Schema design isn’t just about normalization. There are well-known patterns that solve specific problems — analytics, flexibility, deletion, and polymorphism. Let’s walk through the most important ones.

Star Schema

The star schema is the bread and butter of data warehousing and analytics. It has two types of tables:

  • Fact table — stores the events/transactions (sales, clicks, orders). Contains foreign keys to dimension tables and numeric measures (amount, quantity, duration).
  • Dimension tables — stores the context (who, what, where, when). Contains descriptive attributes.

The fact table sits in the center, surrounded by dimension tables — like a star.

dim_date
year, month, quarter, day_of_week
dim_customer
name, city, segment
fact_sales
FK: date_id, customer_id,
product_id, store_id
quantity, amount, discount
dim_product
name, category, brand
dim_store
name, city, region, country
-- Fact table: one row per sale
CREATE TABLE fact_sales (
    id SERIAL PRIMARY KEY,
    date_id INT REFERENCES dim_date(id),
    customer_id INT REFERENCES dim_customer(id),
    product_id INT REFERENCES dim_product(id),
    store_id INT REFERENCES dim_store(id),
    quantity INT,
    amount DECIMAL(10,2),
    discount DECIMAL(5,2)
);

-- Dimension table: date details (precomputed for fast queries)
CREATE TABLE dim_date (
    id INT PRIMARY KEY,           -- e.g., 20240315 for March 15, 2024
    full_date DATE,
    year INT,
    quarter INT,
    month INT,
    month_name VARCHAR(20),
    day_of_week VARCHAR(10),
    is_weekend BOOLEAN
);

-- Queries are fast — just one JOIN per dimension
SELECT
    d.year, d.month_name,
    p.category,
    SUM(f.amount) AS total_revenue
FROM fact_sales f
JOIN dim_date d ON f.date_id = d.id
JOIN dim_product p ON f.product_id = p.id
GROUP BY d.year, d.month_name, p.category;

When to use: Data warehouses, BI dashboards, analytics systems. Denormalized by design for query speed.

Snowflake Schema

A snowflake schema is like a star schema, but the dimension tables are normalized — they reference additional lookup tables.

Instead of dim_product having a category column, it has a category_id that references a separate dim_category table.

-- Star: product has category as a string
-- dim_product: id, name, category, brand

-- Snowflake: product references a category table
CREATE TABLE dim_category (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department VARCHAR(100)
);

CREATE TABLE dim_product (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    category_id INT REFERENCES dim_category(id),
    brand VARCHAR(100)
);

Star vs Snowflake: Star is simpler and faster (fewer JOINs). Snowflake saves storage and avoids redundancy. Most data warehouses prefer star for its simplicity.

EAV — Entity-Attribute-Value

The EAV pattern stores data as rows of (entity, attribute, value) triples instead of fixed columns. It’s the most flexible schema possible — we can add any attribute to any entity without changing the table structure.

-- Traditional approach: every product type needs different columns
-- Electronics: screen_size, battery, processor
-- Clothing: size, color, material
-- Books: author, isbn, pages
-- Maintaining one table per type (or one wide table) gets messy fast

-- EAV approach: three columns handle everything
CREATE TABLE product_attributes (
    id SERIAL PRIMARY KEY,
    product_id INT REFERENCES products(id),
    attribute_name VARCHAR(100),   -- "screen_size", "color", "author"
    attribute_value TEXT            -- everything stored as text
);

-- Insert attributes for a laptop
INSERT INTO product_attributes (product_id, attribute_name, attribute_value) VALUES
(1, 'screen_size', '15.6'),
(1, 'processor', 'M2'),
(1, 'ram', '16GB');

-- Insert attributes for a t-shirt
INSERT INTO product_attributes (product_id, attribute_name, attribute_value) VALUES
(2, 'size', 'L'),
(2, 'color', 'Blue'),
(2, 'material', 'Cotton');

The downsides of EAV are significant:

  • No type safety (everything is a string)
  • Can’t add constraints (can’t enforce “screen_size must be a number”)
  • Queries become ugly (need to PIVOT to get a row-per-entity view)
  • Performance suffers at scale

When to use: Truly dynamic attributes where the schema is unknowable upfront. Product catalogs with wildly different product types. User-defined custom fields. Consider JSONB columns in PostgreSQL as a better modern alternative.

Polymorphic Associations

When multiple tables need to relate to the same table. For example: both posts and comments can have “likes.” Instead of a separate likes table for each, we use one table with a type column.

-- Polymorphic likes table
CREATE TABLE likes (
    id SERIAL PRIMARY KEY,
    user_id INT REFERENCES users(id),
    likeable_type VARCHAR(50) NOT NULL,  -- 'post' or 'comment'
    likeable_id INT NOT NULL,            -- id in the respective table
    created_at TIMESTAMP DEFAULT NOW()
);

-- Like a post
INSERT INTO likes (user_id, likeable_type, likeable_id) VALUES (1, 'post', 42);

-- Like a comment
INSERT INTO likes (user_id, likeable_type, likeable_id) VALUES (1, 'comment', 99);

-- Get all likes on a post
SELECT * FROM likes WHERE likeable_type = 'post' AND likeable_id = 42;

The problem: We can’t use a foreign key constraint because likeable_id could reference different tables. Referential integrity relies on application code.

Alternatives:

  • Separate tables: post_likes and comment_likes (cleaner, FK constraints work)
  • Shared parent: make both posts and comments inherit from a content table

Soft Deletes

Instead of actually deleting a row with DELETE, we mark it as deleted with a timestamp column. The data stays in the table but is hidden from normal queries.

-- Add a soft delete column
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP DEFAULT NULL;

-- "Delete" a user (soft)
UPDATE users SET deleted_at = NOW() WHERE id = 5;

-- All queries must filter out deleted rows
SELECT * FROM users WHERE deleted_at IS NULL;

-- Create a view for convenience
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;

-- To "undelete" — just clear the timestamp
UPDATE users SET deleted_at = NULL WHERE id = 5;

Benefits:

  • Data is recoverable (undo delete)
  • Audit trail (when was it deleted?)
  • Referential integrity preserved (foreign keys don’t break)
  • Legal/compliance requirements (retain data for N years)

Downsides:

  • Every query needs the WHERE deleted_at IS NULL filter (easy to forget)
  • Table grows forever (need periodic cleanup)
  • Unique constraints get tricky (two “deleted” users can have the same email)
-- Fix the unique constraint issue with a partial unique index
CREATE UNIQUE INDEX idx_users_email_active
    ON users (email)
    WHERE deleted_at IS NULL;
-- Only enforces uniqueness among non-deleted rows

When to Use Each Pattern

PatternBest For
Star SchemaData warehouses, analytics, BI dashboards
Snowflake SchemaWhen storage matters, highly normalized analytics
EAVUser-defined fields, wildly varying attributes
PolymorphicMultiple entity types sharing a feature (likes, comments, tags)
Soft DeletesAudit requirements, undo capability, data retention compliance

Interview Tip

Schema design patterns come up in system design interviews. When designing a data model, mention the pattern by name — “I’d use a star schema for the analytics layer” or “I’d implement soft deletes here for audit compliance.” It shows we’ve seen real production systems and know the trade-offs of each approach.