← Back to DBMS

DBMS — Quick Summary

Quick revision: every topic, key terms, and mnemonics for DBMS.


This is a quick revision doc covering all 47 topics in dbms. Open the linked notes if you want depth.

Database Fundamentals

What Is a DBMS?

What it is. Software between our app and disk that handles store/retrieve/update/delete with concurrency, integrity, security, and crash recovery.

Key terms.

Remember. Files fail at concurrency, integrity, relationships, security, crash recovery. DBMS solves all of these.

Types of Databases

What it is. Seven main categories matched to data shape and access pattern.

Key terms.

Remember. Polyglot persistence — most production stacks use multiple types. Pick by access pattern, not hype.

SQL vs NoSQL

What it is. SQL = fixed schema, vertical scaling, ACID. NoSQL = flexible schema, horizontal scaling, BASE.

Key terms.

Remember. For 90% of apps, PostgreSQL is more than enough. Never say “NoSQL is better because it scales” — talk specific trade-offs for the specific case.

ACID Properties

What it is. Four guarantees that make transactions reliable.

Key terms.

Remember. ACID = Atomicity, Consistency, Isolation, Durability. Always use the bank transfer example. Full ACID: Postgres, MySQL InnoDB, SQLite. Document-level: MongoDB. Not ACID: Cassandra, DynamoDB, Redis.

BASE Properties

What it is. NoSQL alternative to ACID. Trade strict consistency for availability.

Key terms.

Remember. “Likes count off by 1 for a second” = BASE in action. Use BASE for social feeds, counters, browsing. Use ACID for money, bookings, auth.

CAP Theorem

What it is. In a distributed system, during a network partition, pick C or A. P isn’t optional.

Key terms.

Remember. CAP = Consistency / Availability / Partition tolerance. Trade-off only kicks in during a partition. Mention PACELC for bonus points.

SQL & Querying

DDL, DML & DCL

What it is. Four SQL command categories.

Key terms.

Remember. DROP vs TRUNCATE vs DELETE. DROP = remove table+data. TRUNCATE = empty rows (DDL, fast, no rollback, resets auto-increment). DELETE = remove specific rows (DML, slow, rollback-able, fires triggers).

Joins

What it is. Combine data from two+ tables on a related column.

Key terms.

Remember. “Find missing data” = LEFT JOIN + WHERE right_col IS NULL. Always join on indexed columns. Filter early, join late.

Subqueries & CTEs

What it is. Subquery = query inside a query. CTE = named subquery via WITH.

Key terms.

Remember. Use CTEs for readability. Use EXISTS over IN for large subqueries. Watch out: NOT IN with NULLs returns nothing.

Aggregations & GROUP BY

What it is. Squash rows into summaries. GROUP BY splits into buckets first.

Key terms.

Remember. SQL execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. WHERE can’t use aggregates because it runs before GROUP BY.

Window Functions

What it is. Compute across rows without collapsing them. Uses OVER().

Key terms.

Remember. Top N per group = ROW_NUMBER() OVER (PARTITION BY group ORDER BY ...) AS rn then WHERE rn <= N. Classic interview pattern.

Views & Materialized Views

What it is. View = saved query (recipe). Materialized view = stored result (meal-prepped).

Key terms.

Remember. View for abstraction/security. Materialized view for expensive analytics queries. Use CONCURRENTLY to avoid blocking reads during refresh.

Stored Procedures & Triggers

What it is. Server-side logic in the database.

Key terms.

Remember. Modern apps prefer ORMs (Prisma, SQLAlchemy) over stored procs — testable, in git, portable. The classic trigger example: auto-set updated_at on UPDATE.

Schema Design & Normalization

ER Diagrams

What it is. Visual blueprint: entities (tables), attributes (columns), relationships (FKs).

Key terms.

Remember. Always start interviews with an ER diagram. Identify entities (nouns) → relationships (verbs) → cardinality → attributes → keys.

Normalization (1NF–BCNF)

What it is. Reduce redundancy by splitting tables. Prevents insert/update/delete anomalies.

Key terms.

Remember. 1NF = atomic. 2NF = no partial deps. 3NF = no transitive deps. Most apps target 3NF. BCNF only for rare overlapping composite candidate keys.

Cheatsheet — Normal Forms

FormRule (in plain English)
1NFCells are atomic, no repeating groups, has PK
2NF1NF + no non-key attribute depends on only part of a composite PK
3NF2NF + no non-key attribute depends on another non-key attribute
BCNFEvery functional dependency X→Y has X as a superkey

Denormalization

What it is. Intentionally adding redundancy for read performance.

Key terms.

Remember. Normalize first, denormalize strategically. Read-heavy apps benefit. Cost: harder writes, sync complexity, data inconsistency risk.

Relationships & Keys

What it is. Keys uniquely identify rows and link tables.

Key terms.

Remember. Use auto-increment for internal IDs. Use UUID when exposed in URLs or distributed systems. CASCADE for child-data, RESTRICT for blocked deletes, SET NULL for optional relationships.

Schema Design Patterns

What it is. Reusable patterns for analytics, flexibility, deletion, polymorphism.

Key terms.

Remember. Star schema for analytics. Soft delete for audit/compliance — but every query needs WHERE deleted_at IS NULL. Mention pattern names by name in interviews.

Database Migrations

What it is. Versioned, trackable schema changes — git for the DB schema.

Key terms.

Remember. Zero-downtime is the interview hook. Never just RENAME COLUMN on prod. Always test migrations on production-sized data. Make them idempotent (IF NOT EXISTS).

Indexing & Query Optimization

How Indexes Work

What it is. Separate data structure (usually B+ Tree) mapping column values → row locations. Like the index at the back of a book.

Key terms.

Remember. Indexes speed up reads, slow down writes (every INSERT/UPDATE/DELETE updates indexes too). Index foreign keys, WHERE columns, JOIN columns. Don’t index low-cardinality (boolean) columns.

B-Tree and B+ Tree

What it is. Self-balancing tree behind almost every DB index.

Key terms.

Remember. When someone says “database index,” they almost always mean a B+ Tree. Range queries are O(log n) to find start, then linear walk along leaves.

Types of Indexes

What it is. Different index types for different access patterns.

Key terms.

Remember. Leftmost prefix rule. Put most selective column first in composite index. Use partial indexes for “active” subsets. GIN for JSONB/text search.

EXPLAIN and Query Plans

What it is. Shows the database’s execution plan.

Key terms.

Remember. Wrap destructive EXPLAIN ANALYZE in BEGIN; ... ROLLBACK;. Seq Scan on large table = needs an index. Big estimate mismatch = stale stats.

Query Optimization

What it is. Practical techniques for fast SQL.

Key terms.

Remember. Always check EXPLAIN ANALYZE before/after. The N+1 problem is the most common ORM killer. Cursor pagination > OFFSET for deep pages.

What it is. Search inside text efficiently — beyond LIKE.

Key terms.

Remember. Postgres FTS handles stemming + ranking — enough for blogs/docs. Reach for Elasticsearch when typo tolerance, facets, or sub-10ms autocomplete is needed.

Transactions & Concurrency

Transactions Deep Dive

What it is. Group of operations as one unit — all succeed or none.

Key terms.

Remember. Keep transactions short. Process outside the transaction, lock briefly. Never hold locks while doing slow application work. Always handle rollback in error paths.

Isolation Levels

What it is. Controls what one transaction can see of another’s work.

Key terms (anomalies).

Levels.

Remember. Read Committed is fine for most web apps. Use Serializable for financial/inventory. Higher isolation = safer but slower.

Cheatsheet — Isolation Levels

LevelDirty ReadNon-RepeatablePhantom
Read Uncommittedpossiblepossiblepossible
Read Committedpreventedpossiblepossible
Repeatable Readpreventedpreventedpossible*
Serializablepreventedpreventedprevented

*Postgres prevents phantoms at Repeatable Read via snapshot isolation.

Locking Mechanisms

What it is. Coordinate concurrent access.

Key terms.

Remember. SKIP LOCKED is the gold standard for worker queues. Postgres doesn’t do lock escalation; MySQL/SQL Server do.

Deadlocks

What it is. Two transactions waiting for each other’s locks → circular wait.

Key terms.

Remember. Deadlocks are inevitable with concurrency. Lock in consistent order, keep transactions short, always retry on deadlock errors (Postgres SQLSTATE 40P01, MySQL 1213).

MVCC (Multi-Version Concurrency Control)

What it is. Multiple versions of each row → readers don’t block writers, writers don’t block readers.

Key terms.

Remember. MVCC is why modern DBs handle thousands of connections. Long-running transactions prevent VACUUM — keep transactions short.

Optimistic vs Pessimistic Locking

What it is. Two strategies for concurrent updates.

Key terms.

Remember. Pessimistic for booking seats / inventory (can’t afford conflicts). Optimistic for profile edits (rare conflicts). Optimistic = no deadlocks but needs retry logic.

NoSQL Databases

Document Stores

What it is. Store data as JSON-like documents (MongoDB, CouchDB, Firestore).

Key terms.

Remember. Embed for one-to-few + accessed together. Reference for unbounded growth, shared data, frequently-changing data. 16MB doc size limit.

Key-Value Stores

What it is. Simplest NoSQL. Key → value.

Key terms.

Remember. Use for cache, session, rate limit, leaderboard, feature flags. Usually used alongside a primary DB, not as replacement.

Wide-Column Stores

What it is. Cassandra, HBase, Bigtable. Rows can have different columns.

Key terms.

Remember. Designed for massive write throughput + horizontal scale. Design tables around access patterns. Used by Discord, Netflix, Apple. Bad for ad-hoc queries.

Graph Databases

What it is. Relationships are first-class. Neo4j, ArangoDB, Neptune.

Key terms.

MATCH (m:Person {name: "Manish"})-[:FRIENDS_WITH*2]->(fof)
RETURN DISTINCT fof.name

Remember. Multi-hop traversals are O(1) per hop because edges are physical pointers. Beats SQL JOINs at “friends of friends.” Use for social, recommendations, fraud detection, knowledge graphs.

NoSQL Data Modeling

What it is. Backwards from SQL — start from access patterns, design schema to match.

Key terms.

Remember. “If I need a JOIN, I probably need to restructure the data.” Don’t model NoSQL like SQL — that gives the worst of both.

Choosing the Right Database

What it is. Decision framework: data shape + access patterns + consistency + scale.

Remember. Just Use PostgreSQL for most things. It does JSON (Mongo), full-text (basic Elastic), pub/sub. Add specialized DBs when proven need:

Redis Essentials

Redis Data Types and Commands

What it is. In-memory data structure store, not just a cache.

Key terms.

Remember. Internal encodings: small hashes/lists use listpack; large sorted sets use skiplist.

Redis Persistence

What it is. How in-memory Redis survives a crash.

Key terms.

Remember. everysec loses at most 1s of writes. With aof-use-rdb-preamble yes, restart is fast AND data loss is minimal.

Caching Patterns

What it is. Strategies for putting Redis in front of a DB.

Key terms.

Remember. “There are only two hard things: cache invalidation and naming things.” Use TTL jitter to avoid avalanche.

Redis Eviction and Expiry

What it is. What happens when memory fills up + TTL expiration.

Key terms.

Remember. Eviction (memory full) ≠ Expiration (TTL elapsed). Pure cache → allkeys-lru. Mix → volatile-lru (set TTL only on cache keys).

Redis Pub/Sub, Distributed Locks, and Cluster

What it is. Real-time messaging, coordination, horizontal scaling.

Key terms.

Remember. Pub/Sub messages are gone if no listeners. Use Streams for reliability. Sentinel for HA, Cluster for scale-out.

Scaling & Real-World Patterns

Replication

What it is. Multiple copies of the same data on different servers.

Key terms.

Remember. Replication scales reads, not writes. Async = fast + risky. Use semi-sync for sweet spot. Always plan for replication lag.

Sharding

What it is. Splitting data across multiple servers (shards).

Key terms.

Remember. Sharding is a last resort. Try vertical scaling, replicas, caching, query optimization first. Discord shards by guild_id, Instagram by user_id.

Connection Pooling

What it is. Reuse open connections; avoid TCP+TLS+auth+process-creation overhead per request.

Key terms.

Remember. Pool too large is worse than too small — context-switching, memory pressure. PgBouncer in transaction mode is the standard external pooler.

CQRS and Event Sourcing

What it is. Separate read/write models; optionally store events instead of state.

Key terms.

Remember. CQRS without ES: separate read/write models, both store state. ES without CQRS: events but shared model. Use both for complex domains, audit needs, temporal queries. Trade-off: eventual consistency + complexity. Most CRUD doesn’t need this.

Database per Service, Outbox, CDC

What it is. Microservice data patterns.

Key terms.

Remember. Every microservice pattern solves one problem and creates another. Outbox for reliable events. CDC for free events from DB log. Sagas only give eventual consistency — UI shows “processing” until done.