Choosing the Right Database

intermediate database-selection sql-vs-nosql architecture decision-framework

One of the most common interview questions (and real-world decisions) is: “Which database should we use?” The answer is always “it depends.” But we can break it down into a practical framework.

There’s no single best database. Each type was built to solve a specific problem. Our job is to match the problem to the tool.

The Decision Framework

Ask these four questions:

1. What Does Our Data Look Like?

Structured/Tabular
Fixed schema, rows and columns → Relational (PostgreSQL, MySQL)
Semi-structured
Nested objects, variable fields → Document (MongoDB)
Simple lookups
Key → value pairs, caches → Key-Value (Redis)
Highly connected
Relationships matter most → Graph (Neo4j)
Time-series/Logs
Append-heavy, sorted by time → Wide-Column (Cassandra) or TimescaleDB

2. What Are Our Access Patterns?

  • OLTP (lots of small reads/writes) — relational or document databases
  • OLAP (complex analytics, aggregations) — columnar databases (ClickHouse, BigQuery, Redshift)
  • Full-text search — Elasticsearch, Meilisearch, or PostgreSQL with GIN indexes
  • Graph traversal — graph databases
  • Simple lookups by key — key-value stores

3. What Consistency Do We Need?

  • Strong consistency (every read sees the latest write) — PostgreSQL, MySQL, MongoDB (with write concern)
  • Eventual consistency (reads might be slightly stale) — Cassandra, DynamoDB (default), Redis replicas

If we’re building a banking system, we need strong consistency. If we’re building a social media feed, eventual consistency is fine — nobody notices if a like count is off by 1 for a second.

4. How Big Will We Get?

  • Vertical scaling (bigger server) — PostgreSQL, MySQL. Works well up to several TB and thousands of connections.
  • Horizontal scaling (more servers) — Cassandra, DynamoDB, MongoDB (with sharding). Needed for truly massive scale.

Most applications never need horizontal scaling. PostgreSQL on a decent server can handle millions of users. Don’t over-engineer.

Real-World Examples

E-commerce Platform

Primary database: PostgreSQL
  - Products, orders, users, payments
  - Need ACID transactions for orders and payments
  - Complex queries (search by category, price range, filters)

Cache layer: Redis
  - Product catalog cache
  - Session storage
  - Cart data
  - Rate limiting

Search: Elasticsearch
  - Product search with typo tolerance
  - Faceted filtering (brand, price, rating)

Social Media App

Primary database: PostgreSQL or MySQL
  - User profiles, posts, follows
  - Need strong consistency for auth and profiles

Activity feed: Redis (sorted sets)
  - Pre-computed feed per user
  - Fast reads for infinite scroll

Media metadata: MongoDB
  - Variable metadata per media type (photo vs video vs story)

Friend suggestions: Neo4j or graph queries
  - "People you may know" (friends of friends)

IoT / Monitoring System

Time-series data: Cassandra or TimescaleDB
  - Millions of sensor readings per second
  - Need massive write throughput

Recent data / alerts: Redis
  - Current sensor values
  - Alert thresholds and triggers

Configuration: PostgreSQL
  - Device management
  - User accounts
  - Alert rules

Real-time Chat (like Discord)

Messages: Cassandra
  - Billions of messages
  - Partition by channel, cluster by timestamp
  - Fast writes, fast range reads

Presence (online/offline): Redis
  - Sub-millisecond lookups
  - Pub/sub for real-time updates

User/server data: PostgreSQL
  - User accounts, server settings, permissions
  - Need ACID for role changes and permissions

The “Just Use PostgreSQL” Rule

Here’s a pragmatic take: start with PostgreSQL for almost everything. It handles JSON (like MongoDB), has full-text search (like Elasticsearch for simpler cases), supports pub/sub, and can handle more scale than most apps will ever need.

Add specialized databases only when PostgreSQL demonstrably can’t handle a specific requirement:

  • Need sub-millisecond caching? Add Redis.
  • Need to search millions of documents with typo tolerance? Add Elasticsearch.
  • Need to write millions of events per second? Add Cassandra or Kafka.
  • Need complex graph traversals at scale? Add Neo4j.

Don’t start with 5 databases on day one. Start with one and add others when the need is proven.

Quick Comparison

DatabaseTypeBest ForConsistencyScale
PostgreSQLRelationalGeneral purpose, OLTPStrongVertical
MySQLRelationalWeb apps, read-heavyStrongVertical
MongoDBDocumentVariable schemas, rapid devTunableHorizontal
RedisKey-ValueCaching, sessions, real-timeEventual*Horizontal
CassandraWide-ColumnTime-series, high write throughputTunableHorizontal
Neo4jGraphRelationship-heavy queriesStrongVertical
ElasticsearchSearch engineFull-text search, analyticsEventualHorizontal
DynamoDBKey-Value/DocServerless, auto-scalingTunableHorizontal
ClickHouseColumnarAnalytics, OLAPEventualHorizontal

*Redis can be configured for strong consistency with Redis Sentinel/Cluster

In simple language, the right database depends on what we’re building. Start with PostgreSQL (it can do 80% of things well), add Redis for caching, and only bring in specialized databases when we have a proven need. The worst mistake is using 5 different databases on day one because they sound cool. Keep it simple until complexity is required.

References