EXPLAIN and Query Plans

intermediate explain query-plan performance optimization

Writing a query is one thing. Knowing how the database actually executes it is another. That’s where EXPLAIN comes in — it shows us the database’s game plan for running our query.

Think of it like asking Google Maps for directions. We give it the destination (the query), and it shows us the route it’s going to take (the execution plan). If the route looks bad, we can reroute.

EXPLAIN vs EXPLAIN ANALYZE

EXPLAIN shows the estimated plan without actually running the query. It’s safe and fast.

EXPLAIN ANALYZE actually runs the query and shows the real execution time. Use this when we need accurate numbers, but be careful with INSERT/UPDATE/DELETE — it will actually execute them.

-- Just show the plan (doesn't execute)
EXPLAIN SELECT * FROM users WHERE email = 'manish@example.com';

-- Show the plan AND actually run it (shows real timing)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'manish@example.com';

-- Wrap destructive queries in a transaction to avoid side effects
BEGIN;
EXPLAIN ANALYZE DELETE FROM users WHERE created_at < '2020-01-01';
ROLLBACK;  -- undo the delete

Reading a Query Plan

Here’s what a typical PostgreSQL query plan looks like:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'manish@example.com';

-- Output:
-- Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1 width=72)
--   (actual time=0.023..0.024 rows=1 loops=1)
--   Index Cond: (email = 'manish@example.com')
-- Planning Time: 0.087 ms
-- Execution Time: 0.043 ms

Let’s break that down:

  • Index Scan — it’s using an index (good!)
  • cost=0.42..8.44 — estimated startup cost and total cost (in arbitrary units)
  • rows=1 — estimated number of rows returned
  • actual time=0.023..0.024 — real time in milliseconds
  • loops=1 — how many times this step ran

Common Scan Types

Seq Scan
Reads every row in the table. Slow on large tables. Usually means we need an index.
Index Scan
Uses the index to find rows, then fetches the actual data from the table. Fast.
Index Only Scan
Everything needed is in the index. Doesn't touch the table at all. Fastest.
Bitmap Scan
Builds a bitmap of matching pages from the index, then reads those pages. Good for medium-selectivity queries.

Seq Scan isn’t always bad. If we’re reading a large portion of the table (say 50%+ of rows), a sequential scan can actually be faster than an index scan because sequential I/O is faster than random I/O.

Common Join Types

When queries involve multiple tables, we’ll see different join strategies:

Nested Loop    — For each row in table A, scan table B
                 Fast when inner table is small or has an index

Hash Join      — Build a hash table from one table, probe with the other
                 Good for large tables with equality joins

Merge Join     — Sort both tables, then merge them
                 Great when both inputs are already sorted

Spotting Slow Queries

Here are the red flags to look for:

1. Sequential scan on a large table

-- Bad: scanning 10 million rows
-- Seq Scan on orders (cost=0.00..1850000.00 rows=10000000)
--   Filter: (status = 'pending')

-- Fix: add an index
CREATE INDEX idx_orders_status ON orders(status);

2. Nested loop with no index on the inner table

-- Bad: for each user, scanning all orders
-- Nested Loop (cost=0.00..25000000.00)
--   -> Seq Scan on users
--   -> Seq Scan on orders  <-- no index!

-- Fix: add an index on the join column
CREATE INDEX idx_orders_user_id ON orders(user_id);

3. Large row estimate mismatch

-- Plan says rows=1, actual rows=500000
-- This means the statistics are stale
ANALYZE users;  -- update table statistics

MySQL EXPLAIN

MySQL’s EXPLAIN output looks a bit different but tells us the same story:

EXPLAIN SELECT * FROM users WHERE email = 'manish@example.com';

-- +----+------+---------------+------+---------+------+------+-------+
-- | id | type | possible_keys | key  | key_len | ref  | rows | Extra |
-- +----+------+---------------+------+---------+------+------+-------+
-- |  1 | ref  | idx_email     | idx_email | 767 | const | 1   |       |
-- +----+------+---------------+------+---------+------+------+-------+

Key columns to check:

  • type: ALL (full table scan) is bad, ref/eq_ref/const are good
  • key: which index is being used (NULL = no index)
  • rows: estimated rows to examine
  • Extra: Using index (covering index), Using filesort (needs sorting), Using temporary (temp table)

Practical Workflow

-- 1. Write the query
SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01'
GROUP BY u.name
ORDER BY order_count DESC;

-- 2. Check the plan
EXPLAIN ANALYZE <the query above>;

-- 3. If we see Seq Scans, add indexes
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

-- 4. Check the plan again — it should now show Index Scan

In simple language, EXPLAIN is our X-ray for queries. Before optimizing blindly, always check the query plan first. It tells us exactly where the database is spending time and what we need to fix.