If an interviewer asks one ES question, it’s “what’s the difference between text and keyword?” Let’s nail that first, then sweep through the other types.
text vs keyword — THE question
In simple language:
textis for human-readable content you want to search inside. ES analyzes it (lowercase, tokenize, etc.) and builds an inverted index of the words.keywordis for exact-match identifiers and structured tags. ES stores the whole string as one token.
Can't: sort, aggregate, exact-match by default.
Can't: search inside the string.
The classic gotcha: you index a product title as keyword, then can’t figure out why match: "wireless" returns nothing. Because the index has one giant token, not the words inside it.
The multi-field pattern (use this)
You usually want both. ES supports multi-fields — index the same data two ways:
PUT /products
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword": { "type": "keyword", "ignore_above": 256 }
}
}
}
}
}
Now title is searchable (full-text) and title.keyword is sortable/aggregatable (exact). This is so common that dynamic mapping does it automatically for any string.
// Full-text search
GET /products/_search
{ "query": { "match": { "title": "wireless" } } }
// Exact match / sort
GET /products/_search
{
"query": { "term": { "title.keyword": "Sony WH-1000XM5 Wireless Headphones" } },
"sort": [{ "title.keyword": "asc" }]
}
Numeric types
Pick the smallest one that fits — saves disk and memory.
| Type | Range |
|---|---|
byte | -128 to 127 |
short | -32k to 32k |
integer | ~±2.1 billion |
long | huge (default for whole numbers) |
float | 32-bit float |
double | 64-bit float |
scaled_float | float stored as long, e.g. price × 100 |
half_float | 16-bit float (when precision doesn’t matter) |
For money, prefer scaled_float with scaling_factor: 100. Avoids float weirdness.
Date
ES dates are flexible. Accepts:
- ISO 8601 strings:
"2026-05-26T10:30:00Z" - epoch millis:
1748254200000 - Custom formats via the
formatparameter
"created_at": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
Internally stored as a long (epoch ms). Use date_nanos if you need nanosecond precision (for traces, etc.).
Boolean
Easy. true / false. Also accepts "true" / "false" as strings.
IP
A dedicated type for IPv4 and IPv6 addresses. Supports CIDR queries:
"client_ip": { "type": "ip" }
GET /logs/_search
{ "query": { "term": { "client_ip": "192.168.1.0/24" } } }
Object — implicit nesting
Any nested JSON is an object type by default:
{
"user": {
"name": "Manish",
"country": "IN"
}
}
ES flattens this internally to user.name, user.country. We can query as user.name.
Object vs nested — the array trap
Here’s a sneaky one. Object types don’t preserve relationships between array items. Watch this:
{
"comments": [
{ "author": "alice", "text": "great" },
{ "author": "bob", "text": "terrible" }
]
}
Internally ES flattens this to:
comments.author: ["alice", "bob"]
comments.text: ["great", "terrible"]
The fact that “alice” said “great” is lost. A query like “author=bob AND text=great” would match this document! Wrong, but ES doesn’t know.
The fix is the nested type. Each object in the array gets indexed as a hidden separate doc, and you query with a nested query:
"comments": {
"type": "nested",
"properties": {
"author": { "type": "keyword" },
"text": { "type": "text" }
}
}
GET /posts/_search
{
"query": {
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{ "term": { "comments.author": "bob" } },
{ "match": { "comments.text": "great" } }
]
}
}
}
}
}
Trade-off: nested fields are heavier on disk and slightly slower to query. Only use them when array-item relationships actually matter.
Quick reference
text— searchable prosekeyword— exact-match strings, sort, agglong/integer/scaled_float— numbersdate— timestampsboolean— true/falseip— IP addressesobject— implicit, flat nested fieldsnested— when arrays of objects need to stay linkedgeo_point— lat/lon (covered separately)
Pick types deliberately, and you’ll dodge 90% of “why doesn’t my query work” issues.