Full-Text Search

advanced full-text-search tsvector gin-index elasticsearch

Searching for text inside a database sounds simple until we try to do it at scale. Let’s start with the obvious approach and see why it falls apart.

Why LIKE Doesn’t Scale

The LIKE operator is fine for small tables, but it has serious problems:

-- Prefix match — can use a B-Tree index
SELECT * FROM articles WHERE title LIKE 'postgres%';

-- Contains match — CANNOT use an index, full table scan
SELECT * FROM articles WHERE title LIKE '%postgres%';

-- Case-insensitive — even worse
SELECT * FROM articles WHERE LOWER(title) LIKE '%postgres%';

The moment we put % at the beginning, the database has to scan every single row. On a table with millions of articles, that’s unusable. Plus, LIKE doesn’t understand language — it can’t match “running” when we search for “run”.

PostgreSQL has a built-in full-text search engine. The core concepts are:

  • tsvector — a processed version of text, broken into lexemes (normalized words)
  • tsquery — a search query with operators like & (AND), | (OR), ! (NOT)
  • GIN index — the index type that makes full-text search fast
-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown foxes jumped over lazy dogs');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

-- Notice: "foxes" → "fox", "jumped" → "jump", "lazy" → "lazi"
-- Stop words like "the", "over" are removed

The text processing normalizes words (stemming), removes stop words (“the”, “a”, “is”), and records the position of each word. This is what makes it smarter than LIKE.

Setting It Up

-- Add a tsvector column (or compute it on the fly)
ALTER TABLE articles ADD COLUMN search_vector tsvector;

-- Populate it from title and body
UPDATE articles SET search_vector =
    to_tsvector('english', title || ' ' || body);

-- Create a GIN index on it
CREATE INDEX idx_articles_search ON articles USING gin(search_vector);

-- Keep it updated with a trigger
CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
    tsvector_update_trigger(search_vector, 'pg_catalog.english', title, body);

Querying

-- Basic search
SELECT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & optimization');

-- Search with ranking (most relevant first)
SELECT title, ts_rank(search_vector, query) as rank
FROM articles, to_tsquery('english', 'database & optimization') query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- Phrase search (words must be adjacent)
SELECT title FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'query optimization');

-- Prefix matching (autocomplete)
SELECT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'post:*');
-- Matches "postgres", "postgresql", "posting", etc.
Full-Text Search Pipeline
Raw Text
"The foxes jumped"
Tokenize
["The","foxes","jumped"]
Remove Stop Words
["foxes","jumped"]
Stem
["fox","jump"]
tsvector
'fox':1 'jump':2

MySQL’s approach is different but simpler to set up:

-- Create a FULLTEXT index
CREATE FULLTEXT INDEX idx_articles_ft ON articles(title, body);

-- Natural language mode (default — ranked by relevance)
SELECT title, MATCH(title, body) AGAINST('database optimization') as score
FROM articles
WHERE MATCH(title, body) AGAINST('database optimization');

-- Boolean mode (AND, OR, NOT operators)
SELECT title FROM articles
WHERE MATCH(title, body) AGAINST('+database -mysql +optimization' IN BOOLEAN MODE);
-- + means must include, - means must exclude

-- With query expansion (finds related terms)
SELECT title FROM articles
WHERE MATCH(title, body) AGAINST('database' WITH QUERY EXPANSION);

When to Use a Dedicated Search Engine

Built-in full-text search works great for many cases. But we should reach for a dedicated search engine like Elasticsearch or Meilisearch when:

  • We need typo tolerance (“databse” should match “database”)
  • We need faceted search (filter by category, price range, etc. with counts)
  • We need instant search / autocomplete with sub-10ms latency
  • Our search spans multiple tables or even multiple databases
  • We need custom ranking algorithms (boost newer content, popular items, etc.)
  • We’re dealing with very high search volume (thousands of queries per second)

For a blog or documentation site? PostgreSQL’s full-text search is more than enough. For an e-commerce product search or a search-heavy app? Consider Elasticsearch or Meilisearch.

Quick Comparison

FeatureLIKEPostgreSQL FTSMySQL FULLTEXTElasticsearch
StemmingNoYesYesYes
RankingNoYesYesYes (advanced)
Typo toleranceNoNoNoYes
Faceted searchNoNoNoYes
Index supportPrefix onlyGINFULLTEXTInverted index
Setup complexityNoneLowLowHigh

In simple language, if we need to search text in a database, step up from LIKE to the database’s built-in full-text search. It understands language, it’s indexable, and it ranks results. Only bring in Elasticsearch when we need features the database can’t provide.