B-Tree and B+ Tree

intermediate b-tree b-plus-tree data-structures indexes

When a database creates an index, it doesn’t just make a sorted list. It builds a tree-like structure that allows it to find any value in logarithmic time. The most common structures are B-Trees and B+ Trees.

B-Tree

A B-Tree is a self-balancing tree where each node can have multiple keys and multiple children. Unlike a binary tree (which has at most 2 children per node), a B-Tree node can have hundreds of children.

Why does that matter? Because every level of the tree means one disk read. With hundreds of children per node, we only need 3-4 levels to index millions of rows. Fewer levels = fewer disk reads = faster lookups.

Key properties of a B-Tree:

  • All leaf nodes are at the same depth (perfectly balanced)
  • Each node stores multiple keys in sorted order
  • Each node can have up to m children (where m is the tree’s “order”)
  • Data pointers can live in both internal nodes AND leaf nodes

B+ Tree

A B+ Tree is a variation of the B-Tree, and it’s what most databases actually use (PostgreSQL, MySQL InnoDB, SQLite, etc.).

The key differences from a B-Tree:

  1. All actual data lives only in leaf nodes — internal nodes just store keys for navigation
  2. Leaf nodes are linked together — they form a linked list, making range scans super efficient

Think of it like a library catalog system. The internal nodes are like signs pointing us to the right aisle. The leaf nodes are the actual shelves where the books (data) sit. And the shelves are linked together so we can walk from one to the next.

B+ Tree Structure
Root (navigation only) [30 | 60]
↙        ↓        ↘
Internal [10 | 20]
Internal [40 | 50]
Internal [70 | 80]
↓      ↓      ↓      ↓      ↓      ↓
5,8 →
10,15 →
20,25 →
30,35 →
40,45 →
...
↑ Leaf nodes contain actual data + pointers to next leaf (linked list)

Why B+ Tree Wins for Databases

Range queries are fast. If we run SELECT * FROM users WHERE age BETWEEN 20 AND 30, the database finds the leaf node for 20, then just follows the linked list pointers until it hits 30. No need to go back up and down the tree.

Internal nodes are smaller. Since internal nodes only store keys (no data), more keys fit in a single node, which means more children per node, which means a shorter tree, which means fewer disk reads.

Consistent performance. Every lookup takes the same number of steps (root → leaf), because all leaf nodes are at the same depth. No surprises.

How a Lookup Works

Let’s say we’re searching for the value 45 in our B+ Tree:

Step 1: Start at root [30 | 60]
        45 > 30 and 45 < 60 → go to middle child

Step 2: Arrive at internal node [40 | 50]
        45 > 40 and 45 < 50 → go to middle child

Step 3: Arrive at leaf node [40, 42, 45, 48]
        Found 45! Return the row pointer.

That’s just 3 disk reads to find one row among potentially millions. Compare that to scanning every single row in a table.

B-Tree vs B+ Tree Summary

FeatureB-TreeB+ Tree
Data stored inAll nodesLeaf nodes only
Leaf nodes linkedNoYes (linked list)
Range queriesSlower (tree traversal)Fast (follow leaf pointers)
Internal node sizeLarger (has data)Smaller (keys only)
Used bySome older systemsPostgreSQL, MySQL, SQLite

In simple language, B+ Trees are the backbone of database indexes. They keep data sorted, allow lookups in just a few disk reads, and make range queries efficient by linking leaf nodes together. When someone says “database index”, they almost always mean a B+ Tree under the hood.