Cluster, Node, Index, Document

beginner elasticsearch basics architecture

Before we can do anything useful, we need the vocabulary. ES has four main nouns that nest inside each other like Russian dolls.

CLUSTER (e.g. "prod-search")
NODE 1
Index: products
{ doc1, doc2, doc3... }
Index: logs-2026
{ doc1, doc2... }
NODE 2
Index: users
{ doc1, doc2... }
NODE 3
replicas...

Going inside-out:

Document

A document is a single JSON object — the smallest unit of data in ES. One product, one log line, one user. That’s it.

{
  "_id": "abc123",
  "_index": "products",
  "_source": {
    "title": "Sony WH-1000XM5",
    "price": 399,
    "in_stock": true
  }
}

Every doc has a unique _id (we provide it or ES generates one) and lives in exactly one index. The actual data is inside _source.

Index

An index is a collection of documents with similar shape. Think of it like a table in SQL — but it’s just a logical grouping. We’d have a products index, a users index, a logs-2026-05-26 index.

Two rules of thumb:

  • Group similar documents into one index (all products together).
  • For time-series data (logs, events), use one index per day or week. Easier to drop old data.

Node

A node is a single running Elasticsearch process — basically one machine (or one container). Nodes hold shards (pieces of indices) and do the actual searching.

Nodes have roles:

  • Master-eligible — can be elected cluster master (manages cluster state)
  • Data — stores shards, runs queries
  • Ingest — preprocesses documents before indexing
  • Coordinating — routes requests (every node does this by default)

In small setups, one node does everything. In production, we separate them out.

Cluster

A cluster is a group of nodes that work together under one name (e.g. prod-search). They gossip with each other, share cluster state, and rebalance shards when nodes join or leave.

One node elects itself master, manages metadata (which shards live where, what mappings exist), and coordinates the rest. If the master dies, the others elect a new one.

# Check cluster health
curl localhost:9200/_cluster/health?pretty
{
  "cluster_name": "prod-search",
  "status": "green",
  "number_of_nodes": 3,
  "active_primary_shards": 12,
  "active_shards": 24
}

Status colors:

  • Green — all primaries and replicas are assigned. We’re good.
  • Yellow — primaries OK, some replicas missing. Still works, less safe.
  • Red — at least one primary is unassigned. Some data is unreachable. Bad day.

The hierarchy in one line

cluster > node > index > shard > document

We’ll get to shards next — that’s where the distributed magic happens.