Refresh, Flush & Near-Real-Time Search

advanced elasticsearch refresh flush translog lucene

When we index a doc, it is not immediately searchable. Elasticsearch is “near real-time” — there’s a 1-second lag by default. To understand why, we need to walk through how a write actually lands on disk.

The journey of a single doc

In simple language: every indexed doc takes three steps before it’s safe and searchable.

Write path timeline
1. Index request arrives
Doc written to in-memory buffer + appended to translog on disk (fsync per request by default).
↓ every 1s (refresh)
2. Refresh
Buffer is written to a new Lucene segment in OS filesystem cache. Now searchable.
↓ every 30 min or 512MB translog
3. Flush
fsync segments to durable disk, clear translog. Now crash-safe even without translog replay.

Refresh — what makes a doc searchable

A refresh turns the in-memory buffer into a Lucene segment. Default interval: 1 second per shard.

PUT /products/_settings
{
  "index.refresh_interval": "30s"
}

The catch: refreshes are expensive. They flush a new segment file, and lots of tiny segments slow down search (more segments = more lookups per query). For write-heavy workloads (logs, metrics), bumping refresh to 30s or -1 (disabled) is a huge throughput win.

You can also force a refresh on demand — useful in tests:

POST /products/_refresh

Or per-request:

POST /products/_doc?refresh=wait_for
{ "name": "Macbook" }

refresh=wait_for waits for the next scheduled refresh. refresh=true forces one immediately (slow, don’t use in hot paths).

Translog — the safety net

Refresh writes to OS cache, not durable disk. If the node crashes between refreshes, the segment is gone. That’s where the translog (transaction log) saves us.

Every write is appended to the translog before acking the client. On crash recovery, Elasticsearch replays the translog to rebuild the in-memory state.

Default durability:

"index.translog.durability": "request",   // fsync on every request (safe, default)
"index.translog.sync_interval": "5s",     // alt: fsync every 5s if "async"
"index.translog.flush_threshold_size": "512mb"

Setting durability: async means we batch fsyncs every 5s — faster writes but we can lose up to 5s of acked data on crash. Use only for logs/metrics where loss is OK.

Flush — making it truly durable

A flush does three things:

  1. fsync all current segments to disk
  2. Write a new Lucene commit point
  3. Truncate the translog (we no longer need it for those docs)

Flushes happen automatically when the translog hits 512 MB, or every 30 minutes. You rarely call flush manually — but you might before an upgrade or snapshot:

POST /products/_flush

Segment merges — the cleanup crew

Every refresh creates a new tiny segment. Tiny segments are bad for search. Elasticsearch (via Lucene) merges small segments into larger ones in the background. Merges also physically delete docs marked as deleted (an update is a delete + insert — the old version sticks around until merge).

Tuning merges is rarely worth it; the default scheduler is good. The thing to remember: merges cost CPU and IO. If your nodes are under-provisioned, merges will steal resources from indexing and search.

Putting it together: write-heavy tuning

For a logging pipeline doing 100k docs/sec:

PUT /logs-2026-05/_settings
{
  "index.refresh_interval": "30s",
  "index.translog.durability": "async",
  "index.translog.sync_interval": "10s",
  "index.number_of_replicas": 0
}

After bulk load is done, flip back:

PUT /logs-2026-05/_settings
{
  "index.refresh_interval": "1s",
  "index.number_of_replicas": 1
}

This is a classic interview answer: “I’d disable refresh and replicas during bulk load, then re-enable them. Refresh interval and translog durability are the two big knobs for write throughput vs search freshness vs durability.”