When we get a document back from ES, it’s not just our JSON — it’s wrapped in metadata. Knowing what each field means saves a lot of confusion.
Here’s a real response:
{
"_index": "products",
"_id": "abc123",
"_version": 3,
"_seq_no": 42,
"_primary_term": 1,
"found": true,
"_source": {
"title": "Sony WH-1000XM5",
"price": 399,
"category": "audio"
}
}
Let’s break it down.
The metadata fields (prefixed with _)
| Field | What it is |
|---|---|
| _index | Which index this doc lives in |
| _id | Unique identifier within the index |
| _source | The actual JSON we sent in |
| _version | Increments on every update (for optimistic concurrency) |
| _seq_no | Per-shard sequence number, used for safer concurrency control |
| _primary_term | Counter that bumps when a new primary is elected |
| _score | Relevance score (only on search results) |
_id — the document ID
We can provide our own (PUT /products/_doc/abc123) or let ES auto-generate one (POST /products/_doc). Auto-generated IDs are URL-safe Base64 strings like Z6X3kYwBq8....
The _id must be unique within the index. It’s used to route the doc to a shard via hash(_id) % shards.
_source — the field that matters most
This is our original JSON, stored verbatim. By default, ES stores it so we can retrieve the doc as we sent it. You CAN disable _source to save disk, but then you can’t:
- Reindex into a new mapping
- Use update API
- Use highlighting
In simple language: don’t disable _source unless you really know what you’re doing.
_version and concurrency control
Every update bumps _version. We can use it to prevent lost updates:
PUT /products/_doc/abc123?if_seq_no=42&if_primary_term=1
{
"title": "Sony WH-1000XM5 — Updated"
}
If the doc was modified by someone else in the meantime (different seq_no), this fails. That’s optimistic concurrency control — like SQL’s WHERE version = ?.
Note: ES used to use _version for this directly. The modern way is _seq_no + _primary_term because it’s safer across primary failovers.
_type — the ghost of versions past
You might see old tutorials with URLs like /products/product/abc123. That product was the type, a sub-grouping within an index (think tables within a database).
Types are dead. They were deprecated in 6.x, removed in 8.x. Now every index has one implicit type, accessed via _doc:
# Old (don't do this)
PUT /products/product/abc123
# New
PUT /products/_doc/abc123
Why did they kill it? Lucene stores all fields from all types in the same underlying index — so two types in the same index with a name field of different data types caused chaos. Easier to just say “one index, one schema.”
Putting it together
# Index a doc with our own ID
curl -X PUT "localhost:9200/products/_doc/sony-xm5" -H "Content-Type: application/json" -d '
{
"title": "Sony WH-1000XM5",
"price": 399
}'
# Get it back
curl "localhost:9200/products/_doc/sony-xm5"
{
"_index": "products",
"_id": "sony-xm5",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": { "title": "Sony WH-1000XM5", "price": 399 }
}
When you see found: true and your data in _source, you’ve got it. Everything else is plumbing.