How Indexes Work

intermediate indexes performance clustered non-clustered

An index in a database is like the index at the back of a textbook. Instead of flipping through every single page to find “B-Tree”, we look it up in the index, get the page number, and jump straight there.

Without an index, the database has to scan every row in the table to find what we’re looking for. That’s called a full table scan (or sequential scan). For a table with millions of rows, that’s painfully slow.

With an index, the database maintains a separate data structure (usually a B+ Tree) that maps column values to the physical location of the rows. So instead of checking every row, it narrows down to just the matching ones.

The Trade-off: Reads vs Writes

Here’s the thing — indexes aren’t free. Every time we INSERT, UPDATE, or DELETE a row, the database also has to update every index on that table.

Think of it like maintaining a book index. If we add a new chapter, we have to update the index too. More indexes = more maintenance work.

✓ With Index
SELECT → Fast (index lookup)
INSERT → Slower (update index too)
UPDATE → Slower (update index too)
DELETE → Slower (update index too)
✗ Without Index
SELECT → Slow (full table scan)
INSERT → Faster (no index to update)
UPDATE → Faster (no index to update)
DELETE → Faster (no index to update)

Clustered vs Non-Clustered Indexes

A clustered index determines the physical order of data on disk. Think of it like a phone book — the entries are physically sorted by last name. A table can have only one clustered index because the data can only be physically sorted one way.

In most databases, the primary key automatically creates a clustered index. In PostgreSQL, the table data isn’t automatically kept in order, but we can use CLUSTER to physically reorder it.

A non-clustered index is a separate structure that points back to the actual rows. Think of it like the index at the back of a book — it’s separate from the content, and just tells us where to look.

-- Primary key creates a clustered index automatically (in MySQL InnoDB)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,  -- clustered index on id
    email VARCHAR(255),
    name VARCHAR(100)
);

-- Non-clustered index on email
CREATE INDEX idx_users_email ON users(email);

-- Now this query uses the index instead of scanning every row
SELECT * FROM users WHERE email = 'manish@example.com';

When to Create Indexes

Good candidates for indexing:

  • Columns used in WHERE clauses frequently
  • Columns used in JOIN conditions
  • Columns used in ORDER BY or GROUP BY
  • Foreign key columns (these often don’t get indexed automatically)

When NOT to create indexes:

  • Small tables (sequential scan is already fast)
  • Columns with very low cardinality (e.g., a gender column with only 2-3 values)
  • Tables that are write-heavy with few reads
  • Columns that are rarely queried
-- Good: frequently searched column with high cardinality
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Good: composite index for a common query pattern
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

-- Bad: low cardinality column
-- CREATE INDEX idx_users_is_active ON users(is_active);  -- only true/false

-- Drop an index we don't need anymore
DROP INDEX idx_orders_status_date;

Checking If an Index Is Being Used

We can use EXPLAIN to see if our query actually uses the index:

-- Check the query plan
EXPLAIN SELECT * FROM users WHERE email = 'manish@example.com';

-- If we see "Index Scan" or "Index Only Scan" — the index is working
-- If we see "Seq Scan" — the database is ignoring our index

In simple language, indexes are like shortcuts. They make finding data much faster, but we pay a small price on every write. The key is to index the columns we actually query, and not go overboard with indexes on every column.