Choosing the Right Database - DBMS

One of the most common interview questions (and real-world decisions) is: “Which database should we use?” The answer is always “it depends.” But we can break it down into a practical framework.

There’s no single best database. Each type was built to solve a specific problem. Our job is to match the problem to the tool.

The Decision Framework

Ask these four questions:

1. What Does Our Data Look Like?

Structured/Tabular

Fixed schema, rows and columns → Relational (PostgreSQL, MySQL)

Semi-structured

Nested objects, variable fields → Document (MongoDB)

Simple lookups

Key → value pairs, caches → Key-Value (Redis)

Highly connected

Relationships matter most → Graph (Neo4j)

Time-series/Logs

Append-heavy, sorted by time → Wide-Column (Cassandra) or TimescaleDB

2. What Are Our Access Patterns?

OLTP (lots of small reads/writes) — relational or document databases
OLAP (complex analytics, aggregations) — columnar databases (ClickHouse, BigQuery, Redshift)
Full-text search — Elasticsearch, Meilisearch, or PostgreSQL with GIN indexes
Graph traversal — graph databases
Simple lookups by key — key-value stores

3. What Consistency Do We Need?

Strong consistency (every read sees the latest write) — PostgreSQL, MySQL, MongoDB (with write concern)
Eventual consistency (reads might be slightly stale) — Cassandra, DynamoDB (default), Redis replicas

If we’re building a banking system, we need strong consistency. If we’re building a social media feed, eventual consistency is fine — nobody notices if a like count is off by 1 for a second.

4. How Big Will We Get?

Vertical scaling (bigger server) — PostgreSQL, MySQL. Works well up to several TB and thousands of connections.
Horizontal scaling (more servers) — Cassandra, DynamoDB, MongoDB (with sharding). Needed for truly massive scale.

Most applications never need horizontal scaling. PostgreSQL on a decent server can handle millions of users. Don’t over-engineer.

Real-World Examples

E-commerce Platform

Primary database: PostgreSQL
  - Products, orders, users, payments
  - Need ACID transactions for orders and payments
  - Complex queries (search by category, price range, filters)

Cache layer: Redis
  - Product catalog cache
  - Session storage
  - Cart data
  - Rate limiting

Search: Elasticsearch
  - Product search with typo tolerance
  - Faceted filtering (brand, price, rating)

Primary database: PostgreSQL or MySQL
  - User profiles, posts, follows
  - Need strong consistency for auth and profiles

Activity feed: Redis (sorted sets)
  - Pre-computed feed per user
  - Fast reads for infinite scroll

Media metadata: MongoDB
  - Variable metadata per media type (photo vs video vs story)

Friend suggestions: Neo4j or graph queries
  - "People you may know" (friends of friends)

IoT / Monitoring System

Time-series data: Cassandra or TimescaleDB
  - Millions of sensor readings per second
  - Need massive write throughput

Recent data / alerts: Redis
  - Current sensor values
  - Alert thresholds and triggers

Configuration: PostgreSQL
  - Device management
  - User accounts
  - Alert rules

Real-time Chat (like Discord)

Messages: Cassandra
  - Billions of messages
  - Partition by channel, cluster by timestamp
  - Fast writes, fast range reads

Presence (online/offline): Redis
  - Sub-millisecond lookups
  - Pub/sub for real-time updates

User/server data: PostgreSQL
  - User accounts, server settings, permissions
  - Need ACID for role changes and permissions

The “Just Use PostgreSQL” Rule

Here’s a pragmatic take: start with PostgreSQL for almost everything. It handles JSON (like MongoDB), has full-text search (like Elasticsearch for simpler cases), supports pub/sub, and can handle more scale than most apps will ever need.

Add specialized databases only when PostgreSQL demonstrably can’t handle a specific requirement:

Need sub-millisecond caching? Add Redis.
Need to search millions of documents with typo tolerance? Add Elasticsearch.
Need to write millions of events per second? Add Cassandra or Kafka.
Need complex graph traversals at scale? Add Neo4j.

Don’t start with 5 databases on day one. Start with one and add others when the need is proven.

Quick Comparison

Database	Type	Best For	Consistency	Scale
PostgreSQL	Relational	General purpose, OLTP	Strong	Vertical
MySQL	Relational	Web apps, read-heavy	Strong	Vertical
MongoDB	Document	Variable schemas, rapid dev	Tunable	Horizontal
Redis	Key-Value	Caching, sessions, real-time	Eventual*	Horizontal
Cassandra	Wide-Column	Time-series, high write throughput	Tunable	Horizontal
Neo4j	Graph	Relationship-heavy queries	Strong	Vertical
Elasticsearch	Search engine	Full-text search, analytics	Eventual	Horizontal
DynamoDB	Key-Value/Doc	Serverless, auto-scaling	Tunable	Horizontal
ClickHouse	Columnar	Analytics, OLAP	Eventual	Horizontal

*Redis can be configured for strong consistency with Redis Sentinel/Cluster

In simple language, the right database depends on what we’re building. Start with PostgreSQL (it can do 80% of things well), add Redis for caching, and only bring in specialized databases when we have a proven need. The worst mistake is using 5 different databases on day one because they sound cool. Keep it simple until complexity is required.