Pagination in Elasticsearch is the classic interview gotcha. “How would you paginate to page 1000?” — the wrong answer is from: 9990, size: 10. Here’s why, and what to use instead.
from/size (the obvious one)
Looks like SQL LIMIT/OFFSET:
GET /products/_search
{
"from": 20,
"size": 10,
"query": { "match_all": {} }
}
The problem: every shard has to fetch from + size docs, sort them locally, send them to the coordinating node, which then re-sorts all of them and throws away from rows. With 5 shards and from: 9990, every shard ships 10,000 docs over the wire just to return 10. That’s a memory and network disaster.
Elasticsearch enforces a hard ceiling: from + size <= 10000 by default (index.max_result_window).
good up to ~10k results, jumpable pages, stateless
snapshot of data, batch export, no live updates
live deep pagination, recommended for users
Scroll (deprecated for user-facing pagination)
Scroll grabs a snapshot of the index and lets us page through it without re-running the query.
POST /products/_search?scroll=1m
{
"size": 100,
"query": { "match_all": {} }
}
The response includes a _scroll_id. Hand it back to keep paging:
POST /_search/scroll
{
"scroll": "1m",
"scroll_id": "DXF1ZXJ5..."
}
Use scroll for batch jobs: reindexing, exports, ML training data. Don’t use it for users — the snapshot doesn’t reflect new docs added after the scroll started, and it holds resources on the cluster for the scroll’s lifetime.
search_after (the right answer for deep pagination)
search_after says: “give me the next page after this sort value”. No offset arithmetic, no window limit.
GET /products/_search
{
"size": 10,
"query": { "match_all": {} },
"sort": [
{ "created_at": "desc" },
{ "_id": "asc" }
]
}
The last doc in the response has a sort array — say [1700000000, "p_42"]. Pass that to the next request:
GET /products/_search
{
"size": 10,
"query": { "match_all": {} },
"search_after": [1700000000, "p_42"],
"sort": [
{ "created_at": "desc" },
{ "_id": "asc" }
]
}
Each shard now does a cheap “give me docs sorted after this point” — no offset to skip. Always include a tiebreaker field (like _id) in the sort so pagination is deterministic when two docs share the primary sort value.
Point in Time (PIT) — search_after’s best friend
Plain search_after sees new docs as they arrive, which can cause duplicates. To freeze the view, open a PIT:
POST /products/_pit?keep_alive=1m
# { "id": "46ToAwMDaWR5..." }
Then pass pit.id instead of the index name in search requests. Combined with search_after, this is the modern, scalable way to deep-paginate.
TL;DR
- Users browsing pages 1–10? →
from/size. - Batch export of millions of docs? →
scroll(or PIT + search_after). - Live deep pagination, infinite scroll? →
search_after+ PIT. - Never raise
max_result_windowto “fix” deep pagination. That’s treating the symptom.