DBMS — Quick Summary
Quick revision: every topic, key terms, and mnemonics for DBMS.
This is a quick revision doc covering all 47 topics in dbms. Open the linked notes if you want depth.
Database Fundamentals
What Is a DBMS?
What it is. Software between our app and disk that handles store/retrieve/update/delete with concurrency, integrity, security, and crash recovery.
Key terms.
- RDBMS — tables + SQL. Postgres, MySQL, SQLite.
- NoSQL — document, key-value, wide-column, graph.
- NewSQL — SQL + horizontal scaling. CockroachDB, Spanner.
- Internals — Parser → Optimizer → Execution Engine → Storage Engine → Transaction Manager → Buffer Manager.
Remember. Files fail at concurrency, integrity, relationships, security, crash recovery. DBMS solves all of these.
Types of Databases
What it is. Seven main categories matched to data shape and access pattern.
Key terms.
- Relational — Postgres, MySQL. Structured + relationships.
- Document — MongoDB, CouchDB. Flexible JSON.
- Key-Value — Redis, DynamoDB, Memcached. Fast lookups.
- Wide-Column — Cassandra, HBase. Massive write throughput.
- Graph — Neo4j, Neptune. Relationship-heavy data.
- Columnar — ClickHouse, BigQuery. Analytics.
- Time-Series — InfluxDB, TimescaleDB, Prometheus. Timestamped points.
Remember. Polyglot persistence — most production stacks use multiple types. Pick by access pattern, not hype.
SQL vs NoSQL
What it is. SQL = fixed schema, vertical scaling, ACID. NoSQL = flexible schema, horizontal scaling, BASE.
Key terms.
- Vertical scaling — bigger machine.
- Horizontal scaling — more machines.
jsonb— Postgres’s NoSQL escape hatch.- Hybrid — MongoDB has ACID since 4.0; CockroachDB/YugabyteDB do SQL + horizontal.
Remember. For 90% of apps, PostgreSQL is more than enough. Never say “NoSQL is better because it scales” — talk specific trade-offs for the specific case.
ACID Properties
What it is. Four guarantees that make transactions reliable.
Key terms.
- A — Atomicity — all-or-nothing. Crash mid-transaction → rollback.
- C — Consistency — valid state to valid state. Constraints enforced.
- I — Isolation — concurrent txns don’t see each other’s work.
- D — Durability — committed = permanent (via Write-Ahead Log).
Remember. ACID = Atomicity, Consistency, Isolation, Durability. Always use the bank transfer example. Full ACID: Postgres, MySQL InnoDB, SQLite. Document-level: MongoDB. Not ACID: Cassandra, DynamoDB, Redis.
BASE Properties
What it is. NoSQL alternative to ACID. Trade strict consistency for availability.
Key terms.
- B — Basically Available — always responds.
- S — Soft state — state changes over time without input (background sync).
- E — Eventually consistent — replicas converge given time.
- Tunable consistency — Cassandra: ONE / QUORUM / ALL.
Remember. “Likes count off by 1 for a second” = BASE in action. Use BASE for social feeds, counters, browsing. Use ACID for money, bookings, auth.
CAP Theorem
What it is. In a distributed system, during a network partition, pick C or A. P isn’t optional.
Key terms.
- C — Consistency. Every read sees latest write.
- A — Availability. Every request gets a response.
- P — Partition tolerance. System keeps working when nodes can’t talk.
- CP — MongoDB (majority), HBase, etcd, Zookeeper.
- AP — Cassandra, DynamoDB, CouchDB.
- CA — single-node only (PostgreSQL on one box).
- PACELC — extends CAP: Else (no partition), pick Latency or Consistency.
Remember. CAP = Consistency / Availability / Partition tolerance. Trade-off only kicks in during a partition. Mention PACELC for bonus points.
SQL & Querying
DDL, DML & DCL
What it is. Four SQL command categories.
Key terms.
- DDL — Definition. CREATE, ALTER, DROP, TRUNCATE. Auto-committed.
- DML — Manipulation. SELECT, INSERT, UPDATE, DELETE. Transactional.
- DCL — Control. GRANT, REVOKE.
- TCL — Transaction. BEGIN, COMMIT, ROLLBACK, SAVEPOINT.
Remember. DROP vs TRUNCATE vs DELETE. DROP = remove table+data. TRUNCATE = empty rows (DDL, fast, no rollback, resets auto-increment). DELETE = remove specific rows (DML, slow, rollback-able, fires triggers).
Joins
What it is. Combine data from two+ tables on a related column.
Key terms.
- INNER JOIN — only matching rows from both.
- LEFT JOIN — all left + matching right (NULLs for missing).
- RIGHT JOIN — mirror of LEFT (rarely used in practice).
- FULL OUTER JOIN — all from both, NULLs where missing.
- CROSS JOIN — cartesian product.
- SELF JOIN — table joined to itself (org charts, hierarchies).
Remember. “Find missing data” = LEFT JOIN + WHERE right_col IS NULL. Always join on indexed columns. Filter early, join late.
Subqueries & CTEs
What it is. Subquery = query inside a query. CTE = named subquery via WITH.
Key terms.
- Scalar subquery — single value.
- Row subquery — single row.
- Table subquery — full result set in FROM.
- Correlated — references outer query, runs once per row. Slow.
- Non-correlated — runs once independently.
- EXISTS vs IN — EXISTS short-circuits; IN materializes the list.
- Recursive CTE —
WITH RECURSIVE, base case + recursive case. For org charts, dates, trees.
Remember. Use CTEs for readability. Use EXISTS over IN for large subqueries. Watch out: NOT IN with NULLs returns nothing.
Aggregations & GROUP BY
What it is. Squash rows into summaries. GROUP BY splits into buckets first.
Key terms.
- Aggregates — COUNT, SUM, AVG, MIN, MAX.
- COUNT(*) counts all rows. COUNT(col) ignores NULLs. COUNT(DISTINCT col) counts unique.
- WHERE filters rows before grouping. HAVING filters groups after.
- Golden rule — every SELECT column must be in GROUP BY or inside an aggregate.
Remember. SQL execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. WHERE can’t use aggregates because it runs before GROUP BY.
Window Functions
What it is. Compute across rows without collapsing them. Uses OVER().
Key terms.
- PARTITION BY — divides rows into windows (keeps all rows, unlike GROUP BY).
- ROW_NUMBER — unique sequential, no ties.
- RANK — ties get same rank, skips numbers.
- DENSE_RANK — ties get same rank, no gaps.
- LAG / LEAD — previous / next row value.
- Running totals —
SUM(amount) OVER (ORDER BY date). - Frame clause —
ROWS BETWEEN N PRECEDING AND CURRENT ROWfor moving averages.
Remember. Top N per group = ROW_NUMBER() OVER (PARTITION BY group ORDER BY ...) AS rn then WHERE rn <= N. Classic interview pattern.
Views & Materialized Views
What it is. View = saved query (recipe). Materialized view = stored result (meal-prepped).
Key terms.
- VIEW — virtual, recomputes every time, always current.
- MATERIALIZED VIEW — physical, fast reads, stale until refresh.
- REFRESH MATERIALIZED VIEW [CONCURRENTLY] — needs unique index for concurrent.
- WITH CHECK OPTION — prevents inserts that violate the view’s filter.
Remember. View for abstraction/security. Materialized view for expensive analytics queries. Use CONCURRENTLY to avoid blocking reads during refresh.
Stored Procedures & Triggers
What it is. Server-side logic in the database.
Key terms.
- Procedure — called via
CALL, no return value, can manage transactions. - Function — returns a value, callable from SELECT.
- Trigger — auto-fires on INSERT/UPDATE/DELETE. BEFORE can modify; AFTER for logging.
- NEW / OLD — refer to new/old row inside trigger.
Remember. Modern apps prefer ORMs (Prisma, SQLAlchemy) over stored procs — testable, in git, portable. The classic trigger example: auto-set updated_at on UPDATE.
Schema Design & Normalization
ER Diagrams
What it is. Visual blueprint: entities (tables), attributes (columns), relationships (FKs).
Key terms.
- One-to-One (1:1) —
UNIQUEon the FK side. - One-to-Many (1:N) — FK on the “many” side. Most common.
- Many-to-Many (M:N) — needs a junction table.
- Crow’s foot notation — fork = many, line = one, circle = optional.
Remember. Always start interviews with an ER diagram. Identify entities (nouns) → relationships (verbs) → cardinality → attributes → keys.
Normalization (1NF–BCNF)
What it is. Reduce redundancy by splitting tables. Prevents insert/update/delete anomalies.
Key terms.
- 1NF — atomic values, every cell single-valued, table has primary key.
- 2NF — 1NF + no partial dependencies on composite key.
- 3NF — 2NF + no transitive dependencies (A→B→C).
- BCNF — stricter 3NF: every determinant must be a superkey.
Remember. 1NF = atomic. 2NF = no partial deps. 3NF = no transitive deps. Most apps target 3NF. BCNF only for rare overlapping composite candidate keys.
Cheatsheet — Normal Forms
| Form | Rule (in plain English) |
|---|---|
| 1NF | Cells are atomic, no repeating groups, has PK |
| 2NF | 1NF + no non-key attribute depends on only part of a composite PK |
| 3NF | 2NF + no non-key attribute depends on another non-key attribute |
| BCNF | Every functional dependency X→Y has X as a superkey |
Denormalization
What it is. Intentionally adding redundancy for read performance.
Key terms.
- Duplicated fields — store frequently joined data inline.
- Cached counters —
post_countinstead of COUNT(*). - Summary tables — pre-aggregated dashboards.
- Embedded relationships — JSONB items inline.
Remember. Normalize first, denormalize strategically. Read-heavy apps benefit. Cost: harder writes, sync complexity, data inconsistency risk.
Relationships & Keys
What it is. Keys uniquely identify rows and link tables.
Key terms.
- Primary Key — unique + NOT NULL, one per table.
- Natural vs Surrogate — real-world value vs auto-generated.
- Auto-increment vs UUID — sequential (small, indexable, predictable) vs random (large, distributed-safe, unpredictable).
- Foreign Key — references another table’s PK; enforces referential integrity.
- ON DELETE —
RESTRICT(block),CASCADE(delete children),SET NULL,SET DEFAULT. - Composite key — multi-column PK.
- Junction table — bridge for M:N.
- CHECK constraint — validation rule.
Remember. Use auto-increment for internal IDs. Use UUID when exposed in URLs or distributed systems. CASCADE for child-data, RESTRICT for blocked deletes, SET NULL for optional relationships.
Schema Design Patterns
What it is. Reusable patterns for analytics, flexibility, deletion, polymorphism.
Key terms.
- Star schema — central fact table + dimension tables. Data warehouse default.
- Snowflake schema — normalized dimensions (more JOINs, less redundancy).
- EAV (Entity-Attribute-Value) — three columns handle anything. Loses type safety. Prefer JSONB.
- Polymorphic associations —
likeable_type + likeable_idfor many entity types. Loses FK constraint. - Soft delete —
deleted_attimestamp, never actually delete. Use partial unique index for email uniqueness among non-deleted.
Remember. Star schema for analytics. Soft delete for audit/compliance — but every query needs WHERE deleted_at IS NULL. Mention pattern names by name in interviews.
Database Migrations
What it is. Versioned, trackable schema changes — git for the DB schema.
Key terms.
- Up/Down — apply / reverse migration.
schema_migrationstable — tracks applied migrations.- Tools — Flyway, Liquibase, Alembic, Prisma Migrate, Knex, golang-migrate.
- Expand-Contract — safe rename: add new column, backfill, switch reads, drop old.
- CREATE INDEX CONCURRENTLY — non-blocking index creation.
Remember. Zero-downtime is the interview hook. Never just RENAME COLUMN on prod. Always test migrations on production-sized data. Make them idempotent (IF NOT EXISTS).
Indexing & Query Optimization
How Indexes Work
What it is. Separate data structure (usually B+ Tree) mapping column values → row locations. Like the index at the back of a book.
Key terms.
- Full table scan / Seq Scan — reads every row. Slow at scale.
- Index Scan — uses the index.
- Clustered index — physical row order (one per table). InnoDB PK is clustered.
- Non-clustered index — separate structure pointing to rows.
Remember. Indexes speed up reads, slow down writes (every INSERT/UPDATE/DELETE updates indexes too). Index foreign keys, WHERE columns, JOIN columns. Don’t index low-cardinality (boolean) columns.
B-Tree and B+ Tree
What it is. Self-balancing tree behind almost every DB index.
Key terms.
- B-Tree — multi-child tree, data in all nodes.
- B+ Tree — data only in leaves; leaves linked (linked list).
- Why B+ wins — fast range queries via leaf pointers, smaller internal nodes (more children per node), shorter tree.
Remember. When someone says “database index,” they almost always mean a B+ Tree. Range queries are O(log n) to find start, then linear walk along leaves.
Types of Indexes
What it is. Different index types for different access patterns.
Key terms.
- B-Tree — default, range + equality.
- Hash — equality only, no ranges.
- Composite — multi-column. Leftmost prefix rule — index
(a, b, c)works fora,(a,b),(a,b,c), but notbalone. - Covering / INCLUDE — index has all columns needed → Index Only Scan.
- Partial — only indexes a subset (
WHERE is_active = true). - Unique — enforces no duplicates.
- GIN — full-text, JSONB, arrays. GiST — geometric, fuzzy text.
Remember. Leftmost prefix rule. Put most selective column first in composite index. Use partial indexes for “active” subsets. GIN for JSONB/text search.
EXPLAIN and Query Plans
What it is. Shows the database’s execution plan.
Key terms.
- EXPLAIN — estimated plan only.
- EXPLAIN ANALYZE — actually runs the query, shows real timing.
- Seq Scan — reads everything (red flag on large tables).
- Index Scan — uses an index.
- Index Only Scan — fastest, all data in index.
- Bitmap Scan — medium-selectivity.
- Joins — Nested Loop, Hash Join, Merge Join.
- ANALYZE — refresh stats when row estimates are way off.
Remember. Wrap destructive EXPLAIN ANALYZE in BEGIN; ... ROLLBACK;. Seq Scan on large table = needs an index. Big estimate mismatch = stale stats.
Query Optimization
What it is. Practical techniques for fast SQL.
Key terms.
- **Avoid SELECT *** — wastes I/O, blocks covering indexes.
- N+1 problem — 1 query for list + N for each item. Fix with JOIN or batched IN.
- Cursor pagination —
WHERE id > last_id LIMIT 20instead of OFFSET (which scans skipped rows). - EXISTS over IN for large subqueries.
- Filter early, join late — reduce rows before joining.
- No functions on indexed columns —
WHERE YEAR(col) = 2024blocks the index. Use range. - Batch operations — single multi-VALUES INSERT.
Remember. Always check EXPLAIN ANALYZE before/after. The N+1 problem is the most common ORM killer. Cursor pagination > OFFSET for deep pages.
Full-Text Search
What it is. Search inside text efficiently — beyond LIKE.
Key terms.
- LIKE ‘%word%’ — can’t use index, full scan.
- tsvector — processed text (lexemes, stemmed, stop words removed).
- tsquery — search expression with
&,|,!. - GIN index — what makes FTS fast.
- MySQL FULLTEXT — natural language vs boolean modes.
- Elasticsearch / Meilisearch — when we need typo tolerance, faceted search, ranking.
Remember. Postgres FTS handles stemming + ranking — enough for blogs/docs. Reach for Elasticsearch when typo tolerance, facets, or sub-10ms autocomplete is needed.
Transactions & Concurrency
Transactions Deep Dive
What it is. Group of operations as one unit — all succeed or none.
Key terms.
- BEGIN, COMMIT, ROLLBACK — basic flow.
- Auto-commit — default mode; each statement is its own transaction.
- SAVEPOINT — checkpoint we can rollback to.
- Long-running transaction — holds locks, prevents VACUUM. Bad.
Remember. Keep transactions short. Process outside the transaction, lock briefly. Never hold locks while doing slow application work. Always handle rollback in error paths.
Isolation Levels
What it is. Controls what one transaction can see of another’s work.
Key terms (anomalies).
- Dirty read — read uncommitted data.
- Non-repeatable read — same row reread, different value.
- Phantom read — same query reread, new rows appear.
Levels.
- Read Uncommitted — allows dirty reads. Postgres ignores it.
- Read Committed — Postgres default. Prevents dirty reads.
- Repeatable Read — MySQL default. Prevents dirty + non-repeatable. Postgres also prevents phantoms here (snapshot).
- Serializable — prevents everything. May fail with serialization errors → retry.
Remember. Read Committed is fine for most web apps. Use Serializable for financial/inventory. Higher isolation = safer but slower.
Cheatsheet — Isolation Levels
| Level | Dirty Read | Non-Repeatable | Phantom |
|---|---|---|---|
| Read Uncommitted | possible | possible | possible |
| Read Committed | prevented | possible | possible |
| Repeatable Read | prevented | prevented | possible* |
| Serializable | prevented | prevented | prevented |
*Postgres prevents phantoms at Repeatable Read via snapshot isolation.
Locking Mechanisms
What it is. Coordinate concurrent access.
Key terms.
- Shared (S) vs Exclusive (X) — multiple S compatible; X blocks all.
- Row-level vs Table-level.
SELECT ... FOR UPDATE— exclusive row lock for read-then-write.SELECT ... FOR SHARE— shared row lock.- NOWAIT — fail immediately if locked.
- SKIP LOCKED — skip locked rows. Perfect for job queues.
- Intent locks (IS, IX) — table-level signals before row locking (MySQL).
- Advisory locks — app-level named locks (
pg_advisory_lock).
Remember. SKIP LOCKED is the gold standard for worker queues. Postgres doesn’t do lock escalation; MySQL/SQL Server do.
Deadlocks
What it is. Two transactions waiting for each other’s locks → circular wait.
Key terms.
- Wait-for graph — DB detects cycles.
- Victim — DB rolls back one txn (Postgres: detector; MySQL: least work).
- Prevention — always lock rows in the same order (e.g., ascending ID).
lock_timeout/innodb_lock_wait_timeout— fail fast.
Remember. Deadlocks are inevitable with concurrency. Lock in consistent order, keep transactions short, always retry on deadlock errors (Postgres SQLSTATE 40P01, MySQL 1213).
MVCC (Multi-Version Concurrency Control)
What it is. Multiple versions of each row → readers don’t block writers, writers don’t block readers.
Key terms.
- Snapshot — frozen view per transaction.
- Postgres —
xmin(creator txn) /xmax(deleter txn) on each tuple. UPDATE = mark dead + insert new. - MySQL/InnoDB — undo log stores old versions.
- VACUUM (Postgres) — cleans up dead tuples. Long-running txns block VACUUM → table bloat.
- Autovacuum — runs in background. Don’t disable.
- Snapshot isolation — natural result of MVCC.
Remember. MVCC is why modern DBs handle thousands of connections. Long-running transactions prevent VACUUM — keep transactions short.
Optimistic vs Pessimistic Locking
What it is. Two strategies for concurrent updates.
Key terms.
- Pessimistic —
SELECT ... FOR UPDATE, lock first. High contention. - Optimistic — version column, check at write:
UPDATE ... SET version = v+1 WHERE id = ? AND version = v. If 0 rows affected → conflict, retry.
Remember. Pessimistic for booking seats / inventory (can’t afford conflicts). Optimistic for profile edits (rare conflicts). Optimistic = no deadlocks but needs retry logic.
NoSQL Databases
Document Stores
What it is. Store data as JSON-like documents (MongoDB, CouchDB, Firestore).
Key terms.
- Collection / Document / Field — like Table / Row / Column.
- BSON — Binary JSON.
- Embedding (denormalized) — child data inside parent doc.
- Referencing — store IDs, look up separately.
$lookup— MongoDB’s join.- Aggregation pipeline —
$match → $group → $sort → $limit.
Remember. Embed for one-to-few + accessed together. Reference for unbounded growth, shared data, frequently-changing data. 16MB doc size limit.
Key-Value Stores
What it is. Simplest NoSQL. Key → value.
Key terms.
- Redis — in-memory, sub-ms, rich data types.
- DynamoDB — managed, serverless, auto-scaling.
- etcd — Kubernetes uses it.
- Memcached — pure cache, multi-threaded.
Remember. Use for cache, session, rate limit, leaderboard, feature flags. Usually used alongside a primary DB, not as replacement.
Wide-Column Stores
What it is. Cassandra, HBase, Bigtable. Rows can have different columns.
Key terms.
- Column family — group of columns stored together.
- Partition key — determines which node holds the data.
- Clustering key — sort order within partition.
- Write path — memtable → SSTable (immutable). Always append.
- Eventual consistency with tunable levels: ONE / QUORUM / ALL.
Remember. Designed for massive write throughput + horizontal scale. Design tables around access patterns. Used by Discord, Netflix, Apple. Bad for ad-hoc queries.
Graph Databases
What it is. Relationships are first-class. Neo4j, ArangoDB, Neptune.
Key terms.
- Nodes — entities. Edges — relationships (typed, directional).
- Properties — key-value on both nodes and edges.
- Cypher — Neo4j’s query language. ASCII-art patterns.
MATCH (m:Person {name: "Manish"})-[:FRIENDS_WITH*2]->(fof)
RETURN DISTINCT fof.name
Remember. Multi-hop traversals are O(1) per hop because edges are physical pointers. Beats SQL JOINs at “friends of friends.” Use for social, recommendations, fraud detection, knowledge graphs.
NoSQL Data Modeling
What it is. Backwards from SQL — start from access patterns, design schema to match.
Key terms.
- Access patterns first — list every query before designing.
- Denormalize intentionally; duplicate is the price for read perf.
- Embed vs reference — embed for one-to-few + read together; reference for shared/unbounded.
- Single table design (DynamoDB) — all entities in one table with composite PK + SK.
- GSI — Global Secondary Index for alternate access patterns.
Remember. “If I need a JOIN, I probably need to restructure the data.” Don’t model NoSQL like SQL — that gives the worst of both.
Choosing the Right Database
What it is. Decision framework: data shape + access patterns + consistency + scale.
Remember. Just Use PostgreSQL for most things. It does JSON (Mongo), full-text (basic Elastic), pub/sub. Add specialized DBs when proven need:
- Sub-ms cache → Redis.
- Typo-tolerant search → Elasticsearch / Meilisearch.
- Massive event ingest → Cassandra / Kafka.
- Heavy graph traversals → Neo4j.
Redis Essentials
Redis Data Types and Commands
What it is. In-memory data structure store, not just a cache.
Key terms.
- String —
SET/GET/INCR/SETNX. Cache, counters, locks. - Hash —
HSET/HGET/HGETALL. Mini objects. - List —
LPUSH/RPOP/BLPOP. Queues, recent items. - Set —
SADD/SINTER/SUNION. Tags, unique members. - Sorted Set —
ZADD/ZRANGE/ZREVRANK. Leaderboards, priority queues. - Stream —
XADD/XREAD/XREADGROUP. Persistent message log with consumer groups. - HyperLogLog —
PFADD/PFCOUNT. Approximate cardinality, ~12KB regardless of size. - Bitmap —
SETBIT/BITCOUNT. DAU tracking, feature flags.
Remember. Internal encodings: small hashes/lists use listpack; large sorted sets use skiplist.
Redis Persistence
What it is. How in-memory Redis survives a crash.
Key terms.
- RDB — periodic binary snapshots. Compact, fast restart, can lose minutes.
- AOF — append every write.
appendfsync=always/everysec(default) /no. AOF rewrite compacts the log. - Hybrid (4.0+) — AOF rewrite uses RDB as base + AOF tail. Best of both. Recommended.
Remember. everysec loses at most 1s of writes. With aof-use-rdb-preamble yes, restart is fast AND data loss is minimal.
Caching Patterns
What it is. Strategies for putting Redis in front of a DB.
Key terms.
- Cache-Aside (Lazy) — most common. App checks cache → on miss reads DB and populates cache. Stale risk.
- Write-Through — write to both. Always consistent. Slower writes.
- Write-Behind — cache first, async DB. Fastest. Data loss risk.
- Read-Through — cache fetches DB on miss (cache library does it).
- Cache invalidation — TTL, event-driven, version-based.
- Cache stampede / Thundering Herd — popular key expires, thousands hit DB simultaneously. Fix: SETNX lock, probabilistic early refresh, background refresh.
- Cache penetration — queries for nonexistent keys. Fix: cache nulls + bloom filter.
- Cache avalanche — many keys expire at once. Fix: jitter the TTL.
Remember. “There are only two hard things: cache invalidation and naming things.” Use TTL jitter to avoid avalanche.
Redis Eviction and Expiry
What it is. What happens when memory fills up + TTL expiration.
Key terms.
- maxmemory + maxmemory-policy.
- Eviction policies —
noeviction,allkeys-lru,volatile-lru,allkeys-lfu,volatile-lfu,allkeys-random,volatile-random,volatile-ttl. - Sampled LRU — Redis doesn’t do true LRU, samples 5 (default) keys.
- Lazy expiration — checks on access.
- Active expiration — background sweep ~10 Hz, samples 20 keys.
lazyfree-lazy-eviction yes— non-blocking deletion of large keys.
Remember. Eviction (memory full) ≠ Expiration (TTL elapsed). Pure cache → allkeys-lru. Mix → volatile-lru (set TTL only on cache keys).
Redis Pub/Sub, Distributed Locks, and Cluster
What it is. Real-time messaging, coordination, horizontal scaling.
Key terms.
- Pub/Sub —
SUBSCRIBE/PUBLISH. Fire-and-forget, no persistence. Use Streams for guaranteed delivery. - Distributed lock —
SET key value NX EX 30. Release with Lua script that checks value before delete (avoid deleting someone else’s lock). - Redlock — lock across N independent Redis instances; need majority.
- Kleppmann’s critique — GC pauses + clock drift can break Redlock. Use fencing tokens for true safety.
- Cluster — 16,384 hash slots split across nodes.
MOVEDredirect. Hash tags{user:42}:profilekeep keys on same slot. - Sentinel — HA for single Redis (failover). Cluster — sharding + failover.
Remember. Pub/Sub messages are gone if no listeners. Use Streams for reliability. Sentinel for HA, Cluster for scale-out.
Scaling & Real-World Patterns
Replication
What it is. Multiple copies of the same data on different servers.
Key terms.
- Primary-Replica (Master-Slave) — one writer, many readers.
- Replication lag — replicas behind primary. Read-your-own-writes problem.
- Multi-Primary — both accept writes; conflicts hard (LWW, CRDTs).
- Sync vs Async — sync waits for replica ACK (safe, slow); async fast but data-loss risk.
- Semi-sync — one sync replica, rest async.
- Failover — manual or automatic. Avoid split-brain with consensus (Raft).
- Tools — Patroni, pg_auto_failover, MySQL Group Replication, Orchestrator.
Remember. Replication scales reads, not writes. Async = fast + risky. Use semi-sync for sweet spot. Always plan for replication lag.
Sharding
What it is. Splitting data across multiple servers (shards).
Key terms.
- Range-based — easy, but hot spots.
- Hash-based — even, but no range queries; resharding moves most data.
- Directory-based — flexible lookup table; SPOF on the directory.
- Consistent hashing — hash ring; adding/removing a shard moves only nearby keys. Use virtual nodes for balance.
- Cross-shard queries / JOINs — slow / impossible. Denormalize to avoid.
- Shard key — high cardinality, even distribution, matches access patterns. Hard to change later.
Remember. Sharding is a last resort. Try vertical scaling, replicas, caching, query optimization first. Discord shards by guild_id, Instagram by user_id.
Connection Pooling
What it is. Reuse open connections; avoid TCP+TLS+auth+process-creation overhead per request.
Key terms.
max_connections— pool size. Surprisingly small.- HikariCP rule —
(cores × 2) + spindles. ~10-20 is enough for most. min_idle, idle_timeout, max_lifetime— recycle stale connections.- PgBouncer modes — transaction (most common), session, statement.
- Forgetting
release()— leaks connections, pool runs out.
Remember. Pool too large is worse than too small — context-switching, memory pressure. PgBouncer in transaction mode is the standard external pooler.
CQRS and Event Sourcing
What it is. Separate read/write models; optionally store events instead of state.
Key terms.
- CQRS — Command Query Responsibility Segregation. Write model (normalized, validated) vs Read model (denormalized, fast).
- Event Sourcing — store every event; current state is replay of events.
- Event store — append-only log keyed by aggregate.
- Projection — materialized read view built from events.
- Snapshotting — periodic state dump to avoid full replay.
Remember. CQRS without ES: separate read/write models, both store state. ES without CQRS: events but shared model. Use both for complex domains, audit needs, temporal queries. Trade-off: eventual consistency + complexity. Most CRUD doesn’t need this.
Database per Service, Outbox, CDC
What it is. Microservice data patterns.
Key terms.
- Database per service — each service owns its data; no shared DB.
- Cross-service queries — API call, replication, or events.
- Dual write problem — saving to DB + publishing to Kafka without atomicity.
- Outbox pattern — write events to local
outboxtable in same transaction; relay publishes. At-least-once → consumers must be idempotent. - CDC (Change Data Capture) — read DB log directly. Postgres WAL, MySQL binlog, Mongo oplog. Debezium is the standard tool.
- Saga — distributed transaction as sequence of local txns + compensating actions.
- Choreography (event-driven, distributed) vs Orchestration (central coordinator).
Remember. Every microservice pattern solves one problem and creates another. Outbox for reliable events. CDC for free events from DB log. Sagas only give eventual consistency — UI shows “processing” until done.