Full-Text Search - DBMS

Searching for text inside a database sounds simple until we try to do it at scale. Let’s start with the obvious approach and see why it falls apart.

Why LIKE Doesn’t Scale

The LIKE operator is fine for small tables, but it has serious problems:

-- Prefix match — can use a B-Tree index
SELECT * FROM articles WHERE title LIKE 'postgres%';

-- Contains match — CANNOT use an index, full table scan
SELECT * FROM articles WHERE title LIKE '%postgres%';

-- Case-insensitive — even worse
SELECT * FROM articles WHERE LOWER(title) LIKE '%postgres%';

The moment we put % at the beginning, the database has to scan every single row. On a table with millions of articles, that’s unusable. Plus, LIKE doesn’t understand language — it can’t match “running” when we search for “run”.

PostgreSQL Full-Text Search

PostgreSQL has a built-in full-text search engine. The core concepts are:

tsvector — a processed version of text, broken into lexemes (normalized words)
tsquery — a search query with operators like & (AND), | (OR), ! (NOT)
GIN index — the index type that makes full-text search fast

-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown foxes jumped over lazy dogs');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

-- Notice: "foxes" → "fox", "jumped" → "jump", "lazy" → "lazi"
-- Stop words like "the", "over" are removed

The text processing normalizes words (stemming), removes stop words (“the”, “a”, “is”), and records the position of each word. This is what makes it smarter than LIKE.

Setting It Up

-- Add a tsvector column (or compute it on the fly)
ALTER TABLE articles ADD COLUMN search_vector tsvector;

-- Populate it from title and body
UPDATE articles SET search_vector =
    to_tsvector('english', title || ' ' || body);

-- Create a GIN index on it
CREATE INDEX idx_articles_search ON articles USING gin(search_vector);

-- Keep it updated with a trigger
CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
    tsvector_update_trigger(search_vector, 'pg_catalog.english', title, body);

Querying

-- Basic search
SELECT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & optimization');

-- Search with ranking (most relevant first)
SELECT title, ts_rank(search_vector, query) as rank
FROM articles, to_tsquery('english', 'database & optimization') query
WHERE search_vector @@ query
ORDER BY rank DESC;

-- Phrase search (words must be adjacent)
SELECT title FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'query optimization');

-- Prefix matching (autocomplete)
SELECT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'post:*');
-- Matches "postgres", "postgresql", "posting", etc.

Full-Text Search Pipeline

Raw Text
"The foxes jumped"

→

Tokenize
["The","foxes","jumped"]

→

Remove Stop Words
["foxes","jumped"]

→

Stem
["fox","jump"]

→

tsvector
'fox':1 'jump':2

MySQL Full-Text Search

MySQL’s approach is different but simpler to set up:

-- Create a FULLTEXT index
CREATE FULLTEXT INDEX idx_articles_ft ON articles(title, body);

-- Natural language mode (default — ranked by relevance)
SELECT title, MATCH(title, body) AGAINST('database optimization') as score
FROM articles
WHERE MATCH(title, body) AGAINST('database optimization');

-- Boolean mode (AND, OR, NOT operators)
SELECT title FROM articles
WHERE MATCH(title, body) AGAINST('+database -mysql +optimization' IN BOOLEAN MODE);
-- + means must include, - means must exclude

-- With query expansion (finds related terms)
SELECT title FROM articles
WHERE MATCH(title, body) AGAINST('database' WITH QUERY EXPANSION);

When to Use a Dedicated Search Engine

Built-in full-text search works great for many cases. But we should reach for a dedicated search engine like Elasticsearch or Meilisearch when:

We need typo tolerance (“databse” should match “database”)
We need faceted search (filter by category, price range, etc. with counts)
We need instant search / autocomplete with sub-10ms latency
Our search spans multiple tables or even multiple databases
We need custom ranking algorithms (boost newer content, popular items, etc.)
We’re dealing with very high search volume (thousands of queries per second)

For a blog or documentation site? PostgreSQL’s full-text search is more than enough. For an e-commerce product search or a search-heavy app? Consider Elasticsearch or Meilisearch.

Quick Comparison

Feature	LIKE	PostgreSQL FTS	MySQL FULLTEXT	Elasticsearch
Stemming	No	Yes	Yes	Yes
Ranking	No	Yes	Yes	Yes (advanced)
Typo tolerance	No	No	No	Yes
Faceted search	No	No	No	Yes
Index support	Prefix only	GIN	FULLTEXT	Inverted index
Setup complexity	None	Low	Low	High

In simple language, if we need to search text in a database, step up from LIKE to the database’s built-in full-text search. It understands language, it’s indexable, and it ranks results. Only bring in Elasticsearch when we need features the database can’t provide.