Database Indexing - High-Level Design

Imagine we have a book with 1,000 pages and we need to find every mention of “caching.” We could read every single page (slow), or we could check the index at the back of the book (fast). Database indexes work exactly the same way.

What Is an Index?

An index is a separate data structure that the database maintains alongside our table. It stores a sorted reference to rows based on specific columns, so the database can jump directly to the right rows instead of scanning the entire table.

Without an index, a query like SELECT * FROM users WHERE email = 'a@b.com' has to check every row in the table. This is called a full table scan. With an index on email, the database finds the row almost instantly.

How Indexes Work

B-Tree Index (The Default)

Most databases use B-trees (balanced trees) as the default index type. A B-tree keeps data sorted and allows searches, insertions, and deletions in O(log n) time.

B-Tree Index Lookup

Root: [M]

↓ ↓

[D, H] [R, W]

↓ ↓ ↓ ↓ ↓ ↓

A,B,C E,F,G I,J,K N,O,P S,T,U X,Y,Z

Finding "F" = 3 steps (Root → [D,H] → leaf) instead of scanning 26 entries

Think of it like a phone book. We don’t read every name — we jump to the right letter section, then narrow down. That’s a B-tree.

Hash Index

A hash index uses a hash function to map keys to locations. It gives us O(1) lookups for exact matches, but it can’t handle range queries (WHERE age > 25). Only useful for equality checks.

Types of Indexes

Single-Column Index

The most basic kind. We index one column.

CREATE INDEX idx_users_email ON users(email);

Now WHERE email = 'a@b.com' is fast.

Composite Index (Multi-Column)

We can index multiple columns together. The order matters a lot.

CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

This index helps queries that filter by user_id, or by user_id AND created_at. But it does not help queries that filter only by created_at. Think of it like a phone book sorted by last name, then first name — we can’t look up just by first name.

This is called the leftmost prefix rule.

Covering Index

A covering index includes all the columns a query needs, so the database doesn’t even have to look at the actual table. It gets everything from the index itself.

-- If we frequently run this query:
SELECT email, name FROM users WHERE email = 'a@b.com';

-- This covering index handles it entirely:
CREATE INDEX idx_users_email_name ON users(email) INCLUDE (name);

Unique Index

Enforces uniqueness on a column. A primary key automatically creates a unique index.

CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

The Trade-Off

Here’s the thing nobody tells us about indexes: they aren’t free.

Indexes speed up reads but slow down writes. Every time we INSERT, UPDATE, or DELETE a row, the database also has to update every index on that table. More indexes = more work on writes.

Indexes also consume disk space. A heavily indexed table might have indexes that are larger than the table itself.

When NOT to Index

Small tables — if the table has a few hundred rows, a full scan is fine
Write-heavy tables — if we’re inserting thousands of rows per second, indexes add overhead
Low-cardinality columns — a boolean column (is_active) has only two values; an index barely helps
Columns we never query on — if we never filter or sort by a column, don’t index it

Practical Tips

-- See how the database plans to run our query
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'a@b.com';

The EXPLAIN output tells us if the database is using an index (Index Scan) or scanning the whole table (Seq Scan). If we see a sequential scan on a large table with a WHERE clause, we probably need an index.

In simple language, an index is a cheat sheet the database keeps on the side. It makes finding data fast, but we pay for it on every write. The trick is indexing the right columns — the ones we actually search and filter on.