You CAN just throw documents at ES and let it auto-create the index. But that’s a great way to end up with 1 shard, dynamic mapping nightmares, and 3 AM pages. Let’s do it properly.
The two halves of index config
Every index has two configuration blocks:
settings— physical/operational stuff. How many shards? How many replicas? Refresh interval? Custom analyzers?mappings— schema. What fields exist? What types are they? How should text be analyzed?
In simple language: settings is “how the index runs”, mappings is “what the data looks like.”
- number_of_replicas (mutable)
- refresh_interval
- analysis (analyzers, tokenizers)
- max_result_window
- codec (compression)
- field data types (text, keyword, long...)
- which analyzer to use per text field
- multi-fields (text + keyword)
- dynamic mapping rules
A real index creation request
Let’s build a products index from scratch:
PUT /products
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "5s",
"analysis": {
"analyzer": {
"product_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding", "stop"]
}
}
}
},
"mappings": {
"properties": {
"title": { "type": "text", "analyzer": "product_analyzer" },
"sku": { "type": "keyword" },
"price": { "type": "scaled_float", "scaling_factor": 100 },
"in_stock": { "type": "boolean" },
"created_at": { "type": "date" },
"category": { "type": "keyword" },
"description": {
"type": "text",
"fields": {
"keyword": { "type": "keyword", "ignore_above": 256 }
}
}
}
}
}
That’s a production-ready starting point. Now let’s unpack the important settings.
number_of_shards
How many primary shards to split the index across. Set this at creation time, you can’t change it later. (Well, you can shrink/split with restrictions, but plan to not.)
Default in modern ES: 1. Good for small indices, bad for anything that’ll grow past 50 GB.
number_of_replicas
How many copies of each primary to maintain. Default: 1. Bump it up for high-read workloads:
PUT /products/_settings
{ "index": { "number_of_replicas": 2 } }
Setting it to 0 saves disk but loses fault tolerance. Useful during bulk imports — set to 0 to speed up writes, then bump back to 1 when done.
refresh_interval
How often new documents become searchable. Default: 1 second. That means after we PUT a doc, there’s up to a 1-second lag before it shows up in searches. ES is near real-time, not real-time.
For bulk indexing, crank it up:
PUT /products/_settings
{ "index": { "refresh_interval": "30s" } }
This trades search freshness for write throughput. Set to -1 to disable refreshing entirely during bulk loads.
analysis
This is where we define custom analyzers (we cover analyzers in detail in note 9). You declare them in settings, then reference them by name in mappings.
What you CAN change later
Most settings split into “static” (set once, requires close+reopen to change) vs “dynamic” (change anytime). Examples of dynamic ones:
number_of_replicasrefresh_intervalmax_result_windowblocks.read_only
Static ones (like number_of_shards, custom analyzers) require closing the index or reindexing.
A common pattern: close → update → reopen
POST /products/_close
PUT /products/_settings { ... static changes ... }
POST /products/_open
Adding a new analyzer mid-flight? You’ll need this dance. Better yet, plan analyzers upfront or reindex into a new index with the right settings.
Quick checks
# What's the current config?
GET /products/_settings
GET /products/_mapping
# Stats
GET /products/_stats
The 30-second summary: create your index explicitly with the right number of shards, mappings, and analyzers from day one. Letting ES auto-create everything almost always bites later.