Query Optimization - DBMS

Writing correct SQL is step one. Writing fast SQL is step two. Let’s go through the most common optimization techniques that come up in interviews and real-world projects.

1. Avoid SELECT *

SELECT * fetches every column from the table, even the ones we don’t need. This wastes I/O, memory, and network bandwidth.

-- Bad: fetches all 20 columns including a large TEXT bio column
SELECT * FROM users WHERE id = 42;

-- Good: only fetch what we need
SELECT id, name, email FROM users WHERE id = 42;

This also prevents us from using covering indexes. If our index covers (id, name, email), the second query can use an index-only scan. The first query can’t.

2. The N+1 Query Problem

This is one of the most common performance killers, especially in ORMs. It happens when we run 1 query to fetch a list, then N more queries to fetch related data for each item.

-- The N+1 problem:
-- Query 1: Get all orders
SELECT * FROM orders;  -- returns 100 orders

-- Then for EACH order, fetch the customer (100 more queries!)
SELECT * FROM customers WHERE id = 1;
SELECT * FROM customers WHERE id = 2;
SELECT * FROM customers WHERE id = 3;
-- ... 97 more queries

That’s 101 queries when we could do it in 1 or 2:

-- Fix with a JOIN (1 query)
SELECT o.*, c.name as customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id;

-- Or fix with IN (2 queries)
SELECT * FROM orders;  -- get all orders
SELECT * FROM customers WHERE id IN (1, 2, 3, ...);  -- get all needed customers at once

N+1 Problem

SELECT * FROM orders;
-- for each order:
SELECT * FROM customers WHERE id = ?;
SELECT * FROM customers WHERE id = ?;
SELECT * FROM customers WHERE id = ?;
... × 100

101 queries total

JOIN Fix

SELECT o.*, c.name
FROM orders o
JOIN customers c
ON o.customer_id = c.id;

1 query total

3. Pagination: OFFSET vs Cursor-Based

OFFSET pagination is simple but gets slower as we go deeper. The database still has to scan through all the skipped rows.

-- Page 1 (fast)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 0;

-- Page 500 (slow — database scans 10,000 rows, throws away 9,980)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 9980;

Cursor-based pagination (also called keyset pagination) is faster because it uses a WHERE clause to skip rows:

-- Page 1
SELECT * FROM products ORDER BY id LIMIT 20;
-- Last id on this page was 20

-- Next page (fast — uses the index!)
SELECT * FROM products WHERE id > 20 ORDER BY id LIMIT 20;
-- Last id on this page was 40

-- Next page
SELECT * FROM products WHERE id > 40 ORDER BY id LIMIT 20;

Cursor-based pagination is always fast regardless of how deep we go, because we’re using an indexed WHERE clause instead of scanning and skipping rows.

4. Use EXISTS Instead of IN for Large Subqueries

When we have a large subquery, EXISTS can be faster than IN because it stops as soon as it finds a match.

-- Slower: IN evaluates the entire subquery first
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);

-- Faster: EXISTS stops at the first match for each user
SELECT * FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.user_id = u.id AND o.total > 1000
);

This difference is most noticeable when the subquery returns a lot of rows.

5. Join Ordering Matters

The database optimizer usually picks the best join order, but sometimes it gets it wrong. As a general rule: filter early, join late. Start with the smallest result set.

-- Let the database filter first, then join
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.country = 'India'          -- filter users first (reduces rows)
  AND o.created_at > '2024-01-01'; -- then filter orders

If the optimizer isn’t choosing the right order, we can sometimes help by restructuring the query or using CTEs:

-- Use a CTE to make our intent clear
WITH indian_users AS (
    SELECT id, name FROM users WHERE country = 'India'
)
SELECT iu.name, o.total
FROM indian_users iu
JOIN orders o ON iu.id = o.user_id
WHERE o.created_at > '2024-01-01';

6. Avoid Functions on Indexed Columns

If we wrap an indexed column in a function, the database can’t use the index.

-- Bad: function on indexed column — can't use index on created_at
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- Good: rewrite as a range — uses the index
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- Bad: LOWER() prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'manish@example.com';

-- Good: create a functional index instead (PostgreSQL)
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

7. Batch Operations

Instead of inserting or updating rows one by one, batch them:

-- Bad: 1000 individual inserts
INSERT INTO logs (message) VALUES ('event 1');
INSERT INTO logs (message) VALUES ('event 2');
-- ... 998 more

-- Good: batch insert
INSERT INTO logs (message) VALUES
    ('event 1'),
    ('event 2'),
    ('event 3'),
    -- ... up to 1000 in one statement
    ('event 1000');

8. Use EXPLAIN to Verify

After any optimization, always check the query plan:

-- Before optimization
EXPLAIN ANALYZE SELECT * FROM orders WHERE YEAR(created_at) = 2024;
-- Seq Scan (slow)

-- After optimization
EXPLAIN ANALYZE SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
-- Index Scan (fast)

In simple language, query optimization is about doing less work. Fetch fewer columns, make fewer round trips, skip fewer rows, and always let the database use its indexes. Check EXPLAIN ANALYZE before and after — the numbers don’t lie.