Design an E-Commerce Platform (Amazon) - High-Level Design

We’re designing an e-commerce platform like Amazon. This is arguably the most comprehensive system design question because it touches nearly every concept: product catalog, search, shopping cart, order processing, payment, inventory management, and recommendations. Interviewers love it because there are so many directions to go deep.

The core challenge: hundreds of millions of products, millions of concurrent users browsing and buying, and we can never sell something that’s out of stock or lose an order. Let’s build it.

Step 1: Requirements

Functional Requirements

Browse and search products (by keyword, category, filters)
Product detail pages (images, description, reviews, pricing)
Shopping cart (add, remove, update quantity)
Checkout and place orders
Payment processing (credit card, wallet)
Order tracking (order status, shipping updates)
Seller management (list products, manage inventory)
Product reviews and ratings

Non-Functional Requirements

High availability — downtime means lost revenue. Every minute of downtime costs millions.
Low latency — product pages should load in < 200ms, search in < 500ms
Strong consistency for inventory — we must never oversell a product
Eventual consistency for catalog — it’s okay if a new product takes a few seconds to appear in search
Scale — 500M products, 300M users, 100K orders/day, millions of concurrent browsers

Step 2: Estimation

Assumptions:

300M total users, 50M daily active users
500M products in the catalog
100K orders per day (peak: 10x during sales events like Prime Day)
Average order: 3 items, $50 total
Each active user views ~20 product pages per session

QPS:

Product page views: 50M × 20 / 86,400 ≈ ~12,000 views/sec
Search queries:     50M × 5 searches / 86,400 ≈ ~3,000 searches/sec
Cart operations:    50M × 3 / 86,400 ≈ ~1,700 ops/sec
Orders:             100K / 86,400 ≈ ~1 order/sec (peak: ~12/sec on sale days)
Peak (sale events): all above × 10

Storage:

Product catalog:  500M products × 10 KB each = 5 TB
Product images:   500M × 5 images × 500 KB = 1.25 PB
Order data:       100K/day × 2 KB = 200 MB/day, 73 GB/year
User data:        300M × 5 KB = 1.5 TB

Product page views dominate. 12,000 reads/sec for product pages is the hottest path. This is a heavily read-heavy workload.

Step 3: High-Level Design

E-Commerce — High-Level Architecture

Users (Web / Mobile)

│

CDN (images) API Gateway

│

Product Service

catalog + details

Search Service

Elasticsearch

Cart Service

Redis + DB

Order Service

order lifecycle

Payment Service

Stripe / internal

Inventory Service

stock management

│

Product DB Order DB Elasticsearch Redis Cache Message Queue

Browse: User → CDN + API → Product Service → Cache/DB → Product Page

Order: Cart → Order Service → Inventory (reserve) → Payment → Fulfill

This is a microservices architecture. Each service owns its own data and logic:

Product Service — manages the product catalog. CRUD for products, categories, and pricing. The read path is heavily cached.
Search Service — powered by Elasticsearch. Handles keyword search, autocomplete, filters (price range, rating, category), and sorting.
Cart Service — manages shopping carts. Cart for logged-in users is persisted in the database. Cart for anonymous users lives in Redis with a session cookie.
Order Service — the orchestrator. Manages the order lifecycle from checkout to delivery.
Inventory Service — tracks stock levels. The most critical service for data consistency.
Payment Service — integrates with payment providers (Stripe, PayPal). Handles charges, refunds, and receipts.

Step 4: API Design

-- Product APIs
GET  /api/v1/products/{product_id}
GET  /api/v1/products?category=electronics&sort=price_asc&page=1
GET  /api/v1/search?q=wireless+headphones&min_price=50&max_price=200

-- Cart APIs
GET    /api/v1/cart
POST   /api/v1/cart/items
  Body: { "product_id": "prod_42", "quantity": 2 }
PUT    /api/v1/cart/items/{item_id}
  Body: { "quantity": 3 }
DELETE /api/v1/cart/items/{item_id}

-- Order APIs
POST /api/v1/orders/checkout
  Body: { "shipping_address_id": "addr_7",
          "payment_method_id": "pm_stripe_123" }
  Response: { "order_id": "ord_999", "status": "pending",
              "total": "$149.97", "estimated_delivery": "2026-04-02" }

GET  /api/v1/orders/{order_id}           -- order details + tracking
GET  /api/v1/orders                      -- order history

-- Seller APIs
POST /api/v1/seller/products
  Body: { "title": "Wireless Headphones", "price": 49.99,
          "category": "electronics", "stock": 500,
          "images": ["img_1", "img_2"] }
PUT  /api/v1/seller/products/{product_id}/inventory
  Body: { "stock_delta": 100 }           -- add 100 units

Step 5: Data Model

-- Users table (PostgreSQL)
CREATE TABLE users (
    user_id         BIGINT PRIMARY KEY,
    email           VARCHAR(255) UNIQUE,
    name            VARCHAR(100),
    password_hash   VARCHAR(255),
    default_address_id BIGINT,
    created_at      TIMESTAMP
);

-- Products table (PostgreSQL — heavy caching)
CREATE TABLE products (
    product_id      BIGINT PRIMARY KEY,
    seller_id       BIGINT NOT NULL,
    title           VARCHAR(300),
    description     TEXT,
    category_id     BIGINT,
    price           DECIMAL(10,2),
    compare_at_price DECIMAL(10,2),          -- original price for "on sale" display
    image_urls      JSON,
    avg_rating      DECIMAL(2,1) DEFAULT 0,
    review_count    INT DEFAULT 0,
    status          VARCHAR(20),             -- 'active', 'draft', 'out_of_stock'
    created_at      TIMESTAMP,
    updated_at      TIMESTAMP,
    INDEX idx_category (category_id),
    INDEX idx_seller (seller_id)
);

-- Inventory table (PostgreSQL — strict consistency)
CREATE TABLE inventory (
    product_id      BIGINT PRIMARY KEY,
    available_stock INT NOT NULL DEFAULT 0,  -- what's available to sell
    reserved_stock  INT NOT NULL DEFAULT 0,  -- held for pending orders
    version         INT NOT NULL DEFAULT 0,  -- for optimistic locking
    updated_at      TIMESTAMP
);

-- Cart table (PostgreSQL — for logged-in users)
CREATE TABLE cart_items (
    user_id         BIGINT,
    product_id      BIGINT,
    quantity        INT NOT NULL,
    added_at        TIMESTAMP,
    PRIMARY KEY (user_id, product_id)
);

-- Orders table (PostgreSQL)
CREATE TABLE orders (
    order_id        BIGINT PRIMARY KEY,
    user_id         BIGINT NOT NULL,
    status          VARCHAR(20),             -- 'pending', 'paid', 'shipped',
                                             -- 'delivered', 'cancelled', 'refunded'
    total_amount    DECIMAL(10,2),
    shipping_address JSON,
    payment_id      BIGINT,
    ordered_at      TIMESTAMP,
    shipped_at      TIMESTAMP,
    delivered_at    TIMESTAMP,
    INDEX idx_user_orders (user_id, ordered_at DESC)
);

-- Order items table
CREATE TABLE order_items (
    order_id        BIGINT,
    product_id      BIGINT,
    quantity        INT,
    price_at_order  DECIMAL(10,2),           -- snapshot of price at time of order
    PRIMARY KEY (order_id, product_id)
);

-- Payments table (PostgreSQL)
CREATE TABLE payments (
    payment_id      BIGINT PRIMARY KEY,
    order_id        BIGINT UNIQUE NOT NULL,
    amount          DECIMAL(10,2),
    method          VARCHAR(20),             -- 'card', 'wallet', 'bank'
    provider_ref    VARCHAR(100),            -- Stripe charge ID
    status          VARCHAR(20),             -- 'pending', 'charged', 'refunded', 'failed'
    idempotency_key VARCHAR(100) UNIQUE,     -- prevent double charges
    processed_at    TIMESTAMP
);

Important: Notice the price_at_order field in order_items. We snapshot the price when the order is placed. If the seller changes the price tomorrow, it shouldn’t affect existing orders. This is a common mistake in interviews — always capture the price at order time.

Step 6: Deep Dives

Deep Dive 1: Inventory Management (The Hardest Part)

Inventory is where things get tricky. Imagine 100 people trying to buy the last unit of a popular product at the same time. We must ensure exactly one person gets it, not two, not zero.

The race condition problem:

Stock: 1 unit left

Thread A: reads stock = 1 → "okay, can sell!"
Thread B: reads stock = 1 → "okay, can sell!"
Thread A: stock = stock - 1 → sets stock to 0
Thread B: stock = stock - 1 → sets stock to 0 (but we already sold it!)

Result: We sold 2 units but only had 1. Oversold!

Solution: Optimistic locking with version numbers

Every update includes the version number. If someone else updated the row between our read and write, our update fails and we retry.

-- Read the current stock and version
SELECT available_stock, version FROM inventory WHERE product_id = 42;
-- Returns: available_stock = 1, version = 7

-- Try to decrement stock (only succeeds if version hasn't changed)
UPDATE inventory
SET available_stock = available_stock - 1,
    version = version + 1
WHERE product_id = 42
  AND version = 7
  AND available_stock >= 1;

-- If rows_affected = 0, someone else got there first. Retry or show "sold out"

This is called optimistic locking because we’re optimistic that no one else is modifying the row. If they did, we detect it and retry. No heavy database locks needed.

The reserved stock pattern:

We don’t actually decrement stock when the user places an order. We reserve it first.

Available: 10, Reserved: 0

User places order:
  Available: 9, Reserved: 1    (reserved for this order)

Payment succeeds:
  Available: 9, Reserved: 0    (reserved → sold, stock stays at 9)

Payment fails:
  Available: 10, Reserved: 0   (release the reservation)

Why? Because payment might take seconds to process. If we decrement immediately and the payment fails, we’d have to add it back. With reservation, we hold the item during payment processing, and only truly sell it when payment succeeds.

Reservation timeout:

What if the user goes through checkout but never completes payment? We set a timeout (e.g., 10 minutes). If the payment isn’t completed within that window, the reservation expires and the stock becomes available again. A background job cleans up expired reservations.

Deep Dive 2: Order Processing Pipeline

An order goes through multiple steps across multiple services. If any step fails, we need to handle it gracefully. This is where the saga pattern comes in.

The happy path:

1. User clicks "Place Order"
2. Order Service creates order (status: pending)
3. Inventory Service reserves stock
4. Payment Service charges the user
5. Inventory Service confirms (reserved → sold)
6. Order Service updates status to "paid"
7. Notification: "Your order is confirmed!"
8. Fulfillment Service picks + packs + ships
9. Order Service updates status to "shipped"
10. Delivery confirmed → status: "delivered"

What if payment fails at step 4?

We need to compensate — undo everything we did:

Release the inventory reservation (step 3 rollback)
Mark the order as “failed”
Notify the user: “Payment failed, please try again”

The saga pattern:

In simple language, a saga is a sequence of steps where each step has a rollback action. If step N fails, we run the rollback for steps N-1, N-2, … down to step 1.

Step 1: Create order           → Rollback: cancel order
Step 2: Reserve inventory      → Rollback: release reservation
Step 3: Charge payment         → Rollback: refund payment
Step 4: Confirm inventory      → (no rollback needed — order is final)

We implement this using a message queue. Each service publishes events, and the next service in the chain listens for them.

Order Service → publishes "order_created"
  → Inventory Service hears it → reserves stock → publishes "stock_reserved"
    → Payment Service hears it → charges card → publishes "payment_charged"
      → Inventory Service hears it → confirms reservation
      → Order Service hears it → marks order as "paid"

If Payment fails → publishes "payment_failed"
  → Inventory Service hears it → releases reservation
  → Order Service hears it → marks order as "failed"

Idempotency is critical:

What if the payment message gets delivered twice? We’d charge the user twice. To prevent this, every payment request includes an idempotency key (usually the order_id). The payment provider checks: “Have I seen this key before? If yes, return the previous result instead of charging again.”

POST /charge
  { "amount": 149.97, "idempotency_key": "ord_999" }

First call:  charges $149.97, returns success
Second call: returns the SAME success response without charging again

This is a non-negotiable requirement for any payment system.

Deep Dive 3: Product Search and Recommendations

Search with Elasticsearch:

The product catalog lives in PostgreSQL (source of truth), but search queries go to Elasticsearch. Why? Because Elasticsearch is built for full-text search with relevance ranking, fuzzy matching, autocomplete, and faceted filtering. PostgreSQL’s LIKE '%headphones%' would be painfully slow on 500M products.

How we keep Elasticsearch in sync:

Seller updates product in PostgreSQL
  → Change Data Capture (CDC) detects the change
  → Publishes event to Kafka
  → Elasticsearch consumer reads the event
  → Updates the search index

Eventual consistency: there's a 1-5 second delay. That's fine for search.

Search features:

GET /search?q=wireless+headphones&category=electronics&min_price=50&sort=relevance

Elasticsearch query:
{
  "query": {
    "bool": {
      "must": { "multi_match": { "query": "wireless headphones", "fields": ["title^3", "description"] }},
      "filter": [
        { "term": { "category": "electronics" }},
        { "range": { "price": { "gte": 50 }}}
      ]
    }
  },
  "sort": ["_score"]
}

The title^3 means title matches are weighted 3x more than description matches. This is how we control relevance.

Autocomplete: Elasticsearch’s completion suggester gives us type-ahead suggestions as the user types. “wire…” → [“wireless headphones”, “wireless mouse”, “wireless charger”].

Product recommendations:

Two common approaches for an e-commerce platform:

“Customers who bought X also bought Y” — collaborative filtering. We look at purchase patterns across all users. If 70% of people who bought a phone case also bought a screen protector, we recommend screen protectors on the phone case page. Simple, effective, and doesn’t need ML.
“Based on your browsing history” — personalized recommendations. We track what the user browsed, what they added to cart, and what they bought. Then we find similar products using content-based similarity or a trained model.

For the interview, mentioning collaborative filtering and explaining how the “frequently bought together” feature works is usually enough.

Step 7: Scaling

Product Service (heaviest read load):

Multi-layer caching: CDN (for product images) → Redis (for product data) → Read replicas (for DB)
Cache product pages aggressively. Product data rarely changes.
Cache invalidation: when a seller updates a product, invalidate the Redis cache and purge the CDN
Shard the products table by product_id

Search Service:

Elasticsearch cluster with multiple shards per index
Separate read and write nodes — heavy search load shouldn’t impact indexing
Warm standby cluster for failover

Order Service:

Shard orders by user_id — a user’s orders live on the same shard
Event-driven architecture: use Kafka for inter-service communication
The order table grows forever, but we only query recent orders. Partition by date.

Inventory Service (most critical for consistency):

Single source of truth for stock levels — no caching layer for writes
Read replicas for display purposes (“In Stock” badge on product pages)
The actual stock check at checkout time must hit the primary database
For flash sales: pre-warm inventory data in Redis, use Redis atomic operations (DECR) for ultra-fast stock reservation, then sync to DB

Handling sale events (Black Friday / Prime Day):

Auto-scale all services 3-5 days before the event
Pre-generate and cache popular product pages
Use a queue-based checkout: if the system is overwhelmed, put users in a virtual queue instead of crashing
Rate limit cart operations per user (prevent bots from hoarding inventory)
Feature flags: disable non-essential features (reviews, recommendations) to save capacity for core shopping flow

Payment processing:

Never process payments synchronously in the order flow — use async processing via message queue
Retry with exponential backoff for payment failures
Idempotency keys on every single payment call
PCI compliance: never store raw credit card numbers. Use tokenized payment methods (Stripe tokens)

Database strategy:

PostgreSQL for transactional data (orders, inventory, payments) — we need ACID guarantees
Elasticsearch for search — eventually consistent with the source of truth
Redis for caching (product data, sessions, carts) and real-time counters
S3 for product images and static assets, fronted by a CDN

In simple language, an e-commerce platform is a collection of independent services — products, search, cart, orders, inventory, payments — each owning its own data. The hardest problems are inventory management (preventing overselling with optimistic locking and reserved stock), order processing (coordinating multiple services with the saga pattern), and search (keeping Elasticsearch in sync with the catalog). Everything is heavily cached because reads dominate writes by 100:1. And the payment system needs bulletproof idempotency because charging someone twice is the quickest way to lose trust. The beauty of this architecture is that each service scales independently — we can throw 10x more resources at the product service during a sale without touching the payment service.