NoSQL Data Modeling - DBMS

NoSQL data modeling is backwards compared to relational modeling. In SQL, we design the schema first and figure out the queries later. In NoSQL, we start with the queries (access patterns) and design the data to serve them.

This is the single most important mindset shift. If we design a NoSQL schema the same way we’d design a SQL schema, we’ll end up with something slow and awkward.

Rule #1: Think Access Patterns First

Before we create any collection or table, we write down every question our application will ask:

1. Get a user by ID
2. Get all orders for a user, sorted by date
3. Get the top 10 products by sales this month
4. Get all comments on a post
5. Check if a user has already liked a post

Then we design the data to answer these questions as efficiently as possible — ideally with a single read operation each.

Rule #2: Denormalization Is the Norm

In SQL, we normalize data to avoid duplication. In NoSQL, we intentionally duplicate data to avoid joins (which most NoSQL databases can’t do efficiently).

// SQL approach (normalized): 3 tables, need JOINs
// users table:     { id: 1, name: "Manish" }
// orders table:    { id: 101, user_id: 1, total: 5000 }
// products table:  { id: 42, name: "Keyboard" }

// NoSQL approach (denormalized): everything in one document
{
  "_id": "order:101",
  "user": {
    "id": 1,
    "name": "Manish"           // duplicated from users
  },
  "items": [
    {
      "product_id": 42,
      "name": "Keyboard",      // duplicated from products
      "price": 2500,
      "qty": 2
    }
  ],
  "total": 5000,
  "created_at": "2024-03-15"
}

The trade-off: when “Manish” changes his name, we have to update it everywhere it’s duplicated. But reads are now lightning fast — one query gets everything.

Denormalize data that’s read together. If we always show the user’s name alongside their orders, put the name in the order document.

Embedding vs Referencing

This is the most common design decision in document databases like MongoDB.

Embed When:

Data is accessed together

One-to-few relationship

Child data doesn't change often

Child data belongs to one parent

Document stays under 16MB

Reference When:

Data is accessed independently

One-to-many or many-to-many

Referenced data changes frequently

Data is shared across documents

Unbounded growth (thousands of items)

// Embed: blog post with comments (accessed together, bounded)
{
  "_id": "post:1",
  "title": "NoSQL Data Modeling",
  "comments": [
    { "user": "Priya", "text": "Great post!", "date": "2024-03-15" },
    { "user": "Rahul", "text": "Very helpful", "date": "2024-03-16" }
  ]
}

// Reference: user with orders (orders grow unboundedly)
{
  "_id": "user:1",
  "name": "Manish",
  "order_ids": ["order:101", "order:102"]  // just IDs
}

Single Table Design (DynamoDB)

DynamoDB takes this to the extreme with single table design. Instead of multiple tables, we store everything in one table with carefully designed partition keys and sort keys.

// All entities in ONE table
PK              | SK              | Data
----------------|-----------------|---------------------------
USER#manish     | PROFILE         | { name: "Manish", email: "..." }
USER#manish     | ORDER#2024-001  | { total: 5000, status: "shipped" }
USER#manish     | ORDER#2024-002  | { total: 3000, status: "pending" }
USER#priya      | PROFILE         | { name: "Priya", email: "..." }
USER#priya      | ORDER#2024-003  | { total: 1500, status: "shipped" }
PRODUCT#42      | INFO            | { name: "Keyboard", price: 2500 }
PRODUCT#42      | REVIEW#manish   | { rating: 5, text: "Love it" }

Now we can answer multiple access patterns with one table:

Get user profile: PK = "USER#manish", SK = "PROFILE"
Get all orders for a user: PK = "USER#manish", SK begins_with "ORDER#"
Get all reviews for a product: PK = "PRODUCT#42", SK begins_with "REVIEW#"

Common Anti-Patterns

1. Treating NoSQL Like SQL

// BAD: Normalized MongoDB (just like SQL tables)
// users: { _id: 1, name: "Manish" }
// addresses: { _id: 1, user_id: 1, city: "Mumbai" }
// Now every read needs two queries + a "join"

// GOOD: Embed the address
// users: { _id: 1, name: "Manish", address: { city: "Mumbai" } }

2. Unbounded Arrays

// BAD: A product with millions of reviews embedded
{
  "_id": "product:42",
  "reviews": [
    // ... 500,000 reviews ...
    // Document exceeds 16MB limit!
  ]
}

// GOOD: Store reviews in a separate collection with a reference
// products: { _id: "product:42", name: "Keyboard" }
// reviews: { product_id: "product:42", user: "Manish", text: "..." }

3. Not Thinking About Updates

If we embed data that changes frequently, every update has to find and modify all the copies. Either accept the update cost, or reference instead.

4. Forgetting About Query Patterns

// We designed for "get orders by user"
// But now we need "get all orders by status"
// Our partition key is user_id — we can't efficiently query by status!

// Solution: create a Global Secondary Index (GSI) on status
// Or design a separate "view" of the data optimized for status queries

Design Process

List all access patterns — what will the application ask?
Identify the primary entity — what’s the main thing we’re querying?
Choose embed vs reference for each relationship
Design the keys — partition key for lookups, sort key for ordering
Add secondary indexes for alternate query patterns
Accept duplication — it’s the price we pay for read performance

In simple language, NoSQL data modeling is about optimizing for reads. We design the data structure to match exactly how the application reads it. Duplication is expected. Normalization is the enemy. If we catch ourselves thinking “I need a JOIN,” we probably need to restructure our data.