Field Data Types

intermediate elasticsearch mapping types

If an interviewer asks one ES question, it’s “what’s the difference between text and keyword?” Let’s nail that first, then sweep through the other types.

text vs keyword — THE question

In simple language:

  • text is for human-readable content you want to search inside. ES analyzes it (lowercase, tokenize, etc.) and builds an inverted index of the words.
  • keyword is for exact-match identifiers and structured tags. ES stores the whole string as one token.
"Sony WH-1000XM5 Wireless Headphones" → indexed as...
type: "text"
→ analyzer runs:
["sony", "wh", "1000xm5", "wireless", "headphones"]
Good for: full-text search, "match" queries.
Can't: sort, aggregate, exact-match by default.
type: "keyword"
→ stored as one token:
["Sony WH-1000XM5 Wireless Headphones"]
Good for: exact match, sorting, aggregations, term filters.
Can't: search inside the string.

The classic gotcha: you index a product title as keyword, then can’t figure out why match: "wireless" returns nothing. Because the index has one giant token, not the words inside it.

The multi-field pattern (use this)

You usually want both. ES supports multi-fields — index the same data two ways:

PUT /products
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword", "ignore_above": 256 }
        }
      }
    }
  }
}

Now title is searchable (full-text) and title.keyword is sortable/aggregatable (exact). This is so common that dynamic mapping does it automatically for any string.

// Full-text search
GET /products/_search
{ "query": { "match": { "title": "wireless" } } }

// Exact match / sort
GET /products/_search
{
  "query": { "term": { "title.keyword": "Sony WH-1000XM5 Wireless Headphones" } },
  "sort": [{ "title.keyword": "asc" }]
}

Numeric types

Pick the smallest one that fits — saves disk and memory.

TypeRange
byte-128 to 127
short-32k to 32k
integer~±2.1 billion
longhuge (default for whole numbers)
float32-bit float
double64-bit float
scaled_floatfloat stored as long, e.g. price × 100
half_float16-bit float (when precision doesn’t matter)

For money, prefer scaled_float with scaling_factor: 100. Avoids float weirdness.

Date

ES dates are flexible. Accepts:

  • ISO 8601 strings: "2026-05-26T10:30:00Z"
  • epoch millis: 1748254200000
  • Custom formats via the format parameter
"created_at": {
  "type": "date",
  "format": "strict_date_optional_time||epoch_millis"
}

Internally stored as a long (epoch ms). Use date_nanos if you need nanosecond precision (for traces, etc.).

Boolean

Easy. true / false. Also accepts "true" / "false" as strings.

IP

A dedicated type for IPv4 and IPv6 addresses. Supports CIDR queries:

"client_ip": { "type": "ip" }

GET /logs/_search
{ "query": { "term": { "client_ip": "192.168.1.0/24" } } }

Object — implicit nesting

Any nested JSON is an object type by default:

{
  "user": {
    "name": "Manish",
    "country": "IN"
  }
}

ES flattens this internally to user.name, user.country. We can query as user.name.

Object vs nested — the array trap

Here’s a sneaky one. Object types don’t preserve relationships between array items. Watch this:

{
  "comments": [
    { "author": "alice", "text": "great" },
    { "author": "bob",   "text": "terrible" }
  ]
}

Internally ES flattens this to:

comments.author: ["alice", "bob"]
comments.text:   ["great", "terrible"]

The fact that “alice” said “great” is lost. A query like “author=bob AND text=great” would match this document! Wrong, but ES doesn’t know.

The fix is the nested type. Each object in the array gets indexed as a hidden separate doc, and you query with a nested query:

"comments": {
  "type": "nested",
  "properties": {
    "author": { "type": "keyword" },
    "text":   { "type": "text" }
  }
}

GET /posts/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "term":  { "comments.author": "bob" } },
            { "match": { "comments.text": "great" } }
          ]
        }
      }
    }
  }
}

Trade-off: nested fields are heavier on disk and slightly slower to query. Only use them when array-item relationships actually matter.

Quick reference

  • text — searchable prose
  • keyword — exact-match strings, sort, agg
  • long / integer / scaled_float — numbers
  • date — timestamps
  • boolean — true/false
  • ip — IP addresses
  • object — implicit, flat nested fields
  • nested — when arrays of objects need to stay linked
  • geo_point — lat/lon (covered separately)

Pick types deliberately, and you’ll dodge 90% of “why doesn’t my query work” issues.