Field Data Types - Elasticsearch

If an interviewer asks one ES question, it’s “what’s the difference between text and keyword?” Let’s nail that first, then sweep through the other types.

text vs keyword — THE question

In simple language:

text is for human-readable content you want to search inside. ES analyzes it (lowercase, tokenize, etc.) and builds an inverted index of the words.
keyword is for exact-match identifiers and structured tags. ES stores the whole string as one token.

"Sony WH-1000XM5 Wireless Headphones" → indexed as...

type: "text"

→ analyzer runs:

["sony", "wh", "1000xm5", "wireless", "headphones"]

Good for: full-text search, "match" queries.
Can't: sort, aggregate, exact-match by default.

type: "keyword"

→ stored as one token:

["Sony WH-1000XM5 Wireless Headphones"]

Good for: exact match, sorting, aggregations, term filters.
Can't: search inside the string.

The classic gotcha: you index a product title as keyword, then can’t figure out why match: "wireless" returns nothing. Because the index has one giant token, not the words inside it.

The multi-field pattern (use this)

You usually want both. ES supports multi-fields — index the same data two ways:

PUT /products
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword", "ignore_above": 256 }
        }
      }
    }
  }
}

Now title is searchable (full-text) and title.keyword is sortable/aggregatable (exact). This is so common that dynamic mapping does it automatically for any string.

// Full-text search
GET /products/_search
{ "query": { "match": { "title": "wireless" } } }

// Exact match / sort
GET /products/_search
{
  "query": { "term": { "title.keyword": "Sony WH-1000XM5 Wireless Headphones" } },
  "sort": [{ "title.keyword": "asc" }]
}

Numeric types

Pick the smallest one that fits — saves disk and memory.

Type	Range
`byte`	-128 to 127
`short`	-32k to 32k
`integer`	~±2.1 billion
`long`	huge (default for whole numbers)
`float`	32-bit float
`double`	64-bit float
`scaled_float`	float stored as long, e.g. price × 100
`half_float`	16-bit float (when precision doesn’t matter)

For money, prefer scaled_float with scaling_factor: 100. Avoids float weirdness.

Date

ES dates are flexible. Accepts:

ISO 8601 strings: "2026-05-26T10:30:00Z"
epoch millis: 1748254200000
Custom formats via the format parameter

"created_at": {
  "type": "date",
  "format": "strict_date_optional_time||epoch_millis"
}

Internally stored as a long (epoch ms). Use date_nanos if you need nanosecond precision (for traces, etc.).

Boolean

Easy. true / false. Also accepts "true" / "false" as strings.

IP

A dedicated type for IPv4 and IPv6 addresses. Supports CIDR queries:

"client_ip": { "type": "ip" }

GET /logs/_search
{ "query": { "term": { "client_ip": "192.168.1.0/24" } } }

Object — implicit nesting

Any nested JSON is an object type by default:

{
  "user": {
    "name": "Manish",
    "country": "IN"
  }
}

ES flattens this internally to user.name, user.country. We can query as user.name.

Object vs nested — the array trap

Here’s a sneaky one. Object types don’t preserve relationships between array items. Watch this:

{
  "comments": [
    { "author": "alice", "text": "great" },
    { "author": "bob",   "text": "terrible" }
  ]
}

Internally ES flattens this to:

comments.author: ["alice", "bob"]
comments.text:   ["great", "terrible"]

The fact that “alice” said “great” is lost. A query like “author=bob AND text=great” would match this document! Wrong, but ES doesn’t know.

The fix is the nested type. Each object in the array gets indexed as a hidden separate doc, and you query with a nested query:

"comments": {
  "type": "nested",
  "properties": {
    "author": { "type": "keyword" },
    "text":   { "type": "text" }
  }
}

GET /posts/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "term":  { "comments.author": "bob" } },
            { "match": { "comments.text": "great" } }
          ]
        }
      }
    }
  }
}

Trade-off: nested fields are heavier on disk and slightly slower to query. Only use them when array-item relationships actually matter.

Quick reference

text — searchable prose
keyword — exact-match strings, sort, agg
long / integer / scaled_float — numbers
date — timestamps
boolean — true/false
ip — IP addresses
object — implicit, flat nested fields
nested — when arrays of objects need to stay linked
geo_point — lat/lon (covered separately)

Pick types deliberately, and you’ll dodge 90% of “why doesn’t my query work” issues.