Index Creation & Settings - Elasticsearch

You CAN just throw documents at ES and let it auto-create the index. But that’s a great way to end up with 1 shard, dynamic mapping nightmares, and 3 AM pages. Let’s do it properly.

The two halves of index config

Every index has two configuration blocks:

settings — physical/operational stuff. How many shards? How many replicas? Refresh interval? Custom analyzers?
mappings — schema. What fields exist? What types are they? How should text be analyzed?

In simple language: settings is “how the index runs”, mappings is “what the data looks like.”

settings

- number_of_shards (immutable)
- number_of_replicas (mutable)
- refresh_interval
- analysis (analyzers, tokenizers)
- max_result_window
- codec (compression)

mappings

- field names
- field data types (text, keyword, long...)
- which analyzer to use per text field
- multi-fields (text + keyword)
- dynamic mapping rules

A real index creation request

Let’s build a products index from scratch:

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "5s",
    "analysis": {
      "analyzer": {
        "product_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "stop"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title":       { "type": "text", "analyzer": "product_analyzer" },
      "sku":         { "type": "keyword" },
      "price":       { "type": "scaled_float", "scaling_factor": 100 },
      "in_stock":    { "type": "boolean" },
      "created_at":  { "type": "date" },
      "category":    { "type": "keyword" },
      "description": {
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword", "ignore_above": 256 }
        }
      }
    }
  }
}

That’s a production-ready starting point. Now let’s unpack the important settings.

`number_of_shards`

How many primary shards to split the index across. Set this at creation time, you can’t change it later. (Well, you can shrink/split with restrictions, but plan to not.)

Default in modern ES: 1. Good for small indices, bad for anything that’ll grow past 50 GB.

`number_of_replicas`

How many copies of each primary to maintain. Default: 1. Bump it up for high-read workloads:

PUT /products/_settings
{ "index": { "number_of_replicas": 2 } }

Setting it to 0 saves disk but loses fault tolerance. Useful during bulk imports — set to 0 to speed up writes, then bump back to 1 when done.

`refresh_interval`

How often new documents become searchable. Default: 1 second. That means after we PUT a doc, there’s up to a 1-second lag before it shows up in searches. ES is near real-time, not real-time.

For bulk indexing, crank it up:

PUT /products/_settings
{ "index": { "refresh_interval": "30s" } }

This trades search freshness for write throughput. Set to -1 to disable refreshing entirely during bulk loads.

`analysis`

This is where we define custom analyzers (we cover analyzers in detail in note 9). You declare them in settings, then reference them by name in mappings.

What you CAN change later

Most settings split into “static” (set once, requires close+reopen to change) vs “dynamic” (change anytime). Examples of dynamic ones:

number_of_replicas
refresh_interval
max_result_window
blocks.read_only

Static ones (like number_of_shards, custom analyzers) require closing the index or reindexing.

A common pattern: close → update → reopen

POST /products/_close
PUT /products/_settings  { ... static changes ... }
POST /products/_open

Adding a new analyzer mid-flight? You’ll need this dance. Better yet, plan analyzers upfront or reindex into a new index with the right settings.

Quick checks

# What's the current config?
GET /products/_settings
GET /products/_mapping

# Stats
GET /products/_stats

The 30-second summary: create your index explicitly with the right number of shards, mappings, and analyzers from day one. Letting ES auto-create everything almost always bites later.

The two halves of index config

A real index creation request

number_of_shards

number_of_replicas

refresh_interval

analysis