We’re designing an e-commerce platform like Amazon. This is arguably the most comprehensive system design question because it touches nearly every concept: product catalog, search, shopping cart, order processing, payment, inventory management, and recommendations. Interviewers love it because there are so many directions to go deep.
The core challenge: hundreds of millions of products, millions of concurrent users browsing and buying, and we can never sell something that’s out of stock or lose an order. Let’s build it.
Step 1: Requirements
Functional Requirements
- Browse and search products (by keyword, category, filters)
- Product detail pages (images, description, reviews, pricing)
- Shopping cart (add, remove, update quantity)
- Checkout and place orders
- Payment processing (credit card, wallet)
- Order tracking (order status, shipping updates)
- Seller management (list products, manage inventory)
- Product reviews and ratings
Non-Functional Requirements
- High availability — downtime means lost revenue. Every minute of downtime costs millions.
- Low latency — product pages should load in < 200ms, search in < 500ms
- Strong consistency for inventory — we must never oversell a product
- Eventual consistency for catalog — it’s okay if a new product takes a few seconds to appear in search
- Scale — 500M products, 300M users, 100K orders/day, millions of concurrent browsers
Step 2: Estimation
Assumptions:
- 300M total users, 50M daily active users
- 500M products in the catalog
- 100K orders per day (peak: 10x during sales events like Prime Day)
- Average order: 3 items, $50 total
- Each active user views ~20 product pages per session
QPS:
Product page views: 50M × 20 / 86,400 ≈ ~12,000 views/sec
Search queries: 50M × 5 searches / 86,400 ≈ ~3,000 searches/sec
Cart operations: 50M × 3 / 86,400 ≈ ~1,700 ops/sec
Orders: 100K / 86,400 ≈ ~1 order/sec (peak: ~12/sec on sale days)
Peak (sale events): all above × 10
Storage:
Product catalog: 500M products × 10 KB each = 5 TB
Product images: 500M × 5 images × 500 KB = 1.25 PB
Order data: 100K/day × 2 KB = 200 MB/day, 73 GB/year
User data: 300M × 5 KB = 1.5 TB
Product page views dominate. 12,000 reads/sec for product pages is the hottest path. This is a heavily read-heavy workload.
Step 3: High-Level Design
This is a microservices architecture. Each service owns its own data and logic:
- Product Service — manages the product catalog. CRUD for products, categories, and pricing. The read path is heavily cached.
- Search Service — powered by Elasticsearch. Handles keyword search, autocomplete, filters (price range, rating, category), and sorting.
- Cart Service — manages shopping carts. Cart for logged-in users is persisted in the database. Cart for anonymous users lives in Redis with a session cookie.
- Order Service — the orchestrator. Manages the order lifecycle from checkout to delivery.
- Inventory Service — tracks stock levels. The most critical service for data consistency.
- Payment Service — integrates with payment providers (Stripe, PayPal). Handles charges, refunds, and receipts.
Step 4: API Design
-- Product APIs
GET /api/v1/products/{product_id}
GET /api/v1/products?category=electronics&sort=price_asc&page=1
GET /api/v1/search?q=wireless+headphones&min_price=50&max_price=200
-- Cart APIs
GET /api/v1/cart
POST /api/v1/cart/items
Body: { "product_id": "prod_42", "quantity": 2 }
PUT /api/v1/cart/items/{item_id}
Body: { "quantity": 3 }
DELETE /api/v1/cart/items/{item_id}
-- Order APIs
POST /api/v1/orders/checkout
Body: { "shipping_address_id": "addr_7",
"payment_method_id": "pm_stripe_123" }
Response: { "order_id": "ord_999", "status": "pending",
"total": "$149.97", "estimated_delivery": "2026-04-02" }
GET /api/v1/orders/{order_id} -- order details + tracking
GET /api/v1/orders -- order history
-- Seller APIs
POST /api/v1/seller/products
Body: { "title": "Wireless Headphones", "price": 49.99,
"category": "electronics", "stock": 500,
"images": ["img_1", "img_2"] }
PUT /api/v1/seller/products/{product_id}/inventory
Body: { "stock_delta": 100 } -- add 100 units
Step 5: Data Model
-- Users table (PostgreSQL)
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
email VARCHAR(255) UNIQUE,
name VARCHAR(100),
password_hash VARCHAR(255),
default_address_id BIGINT,
created_at TIMESTAMP
);
-- Products table (PostgreSQL — heavy caching)
CREATE TABLE products (
product_id BIGINT PRIMARY KEY,
seller_id BIGINT NOT NULL,
title VARCHAR(300),
description TEXT,
category_id BIGINT,
price DECIMAL(10,2),
compare_at_price DECIMAL(10,2), -- original price for "on sale" display
image_urls JSON,
avg_rating DECIMAL(2,1) DEFAULT 0,
review_count INT DEFAULT 0,
status VARCHAR(20), -- 'active', 'draft', 'out_of_stock'
created_at TIMESTAMP,
updated_at TIMESTAMP,
INDEX idx_category (category_id),
INDEX idx_seller (seller_id)
);
-- Inventory table (PostgreSQL — strict consistency)
CREATE TABLE inventory (
product_id BIGINT PRIMARY KEY,
available_stock INT NOT NULL DEFAULT 0, -- what's available to sell
reserved_stock INT NOT NULL DEFAULT 0, -- held for pending orders
version INT NOT NULL DEFAULT 0, -- for optimistic locking
updated_at TIMESTAMP
);
-- Cart table (PostgreSQL — for logged-in users)
CREATE TABLE cart_items (
user_id BIGINT,
product_id BIGINT,
quantity INT NOT NULL,
added_at TIMESTAMP,
PRIMARY KEY (user_id, product_id)
);
-- Orders table (PostgreSQL)
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
user_id BIGINT NOT NULL,
status VARCHAR(20), -- 'pending', 'paid', 'shipped',
-- 'delivered', 'cancelled', 'refunded'
total_amount DECIMAL(10,2),
shipping_address JSON,
payment_id BIGINT,
ordered_at TIMESTAMP,
shipped_at TIMESTAMP,
delivered_at TIMESTAMP,
INDEX idx_user_orders (user_id, ordered_at DESC)
);
-- Order items table
CREATE TABLE order_items (
order_id BIGINT,
product_id BIGINT,
quantity INT,
price_at_order DECIMAL(10,2), -- snapshot of price at time of order
PRIMARY KEY (order_id, product_id)
);
-- Payments table (PostgreSQL)
CREATE TABLE payments (
payment_id BIGINT PRIMARY KEY,
order_id BIGINT UNIQUE NOT NULL,
amount DECIMAL(10,2),
method VARCHAR(20), -- 'card', 'wallet', 'bank'
provider_ref VARCHAR(100), -- Stripe charge ID
status VARCHAR(20), -- 'pending', 'charged', 'refunded', 'failed'
idempotency_key VARCHAR(100) UNIQUE, -- prevent double charges
processed_at TIMESTAMP
);
Important: Notice the price_at_order field in order_items. We snapshot the price when the order is placed. If the seller changes the price tomorrow, it shouldn’t affect existing orders. This is a common mistake in interviews — always capture the price at order time.
Step 6: Deep Dives
Deep Dive 1: Inventory Management (The Hardest Part)
Inventory is where things get tricky. Imagine 100 people trying to buy the last unit of a popular product at the same time. We must ensure exactly one person gets it, not two, not zero.
The race condition problem:
Stock: 1 unit left
Thread A: reads stock = 1 → "okay, can sell!"
Thread B: reads stock = 1 → "okay, can sell!"
Thread A: stock = stock - 1 → sets stock to 0
Thread B: stock = stock - 1 → sets stock to 0 (but we already sold it!)
Result: We sold 2 units but only had 1. Oversold!
Solution: Optimistic locking with version numbers
Every update includes the version number. If someone else updated the row between our read and write, our update fails and we retry.
-- Read the current stock and version
SELECT available_stock, version FROM inventory WHERE product_id = 42;
-- Returns: available_stock = 1, version = 7
-- Try to decrement stock (only succeeds if version hasn't changed)
UPDATE inventory
SET available_stock = available_stock - 1,
version = version + 1
WHERE product_id = 42
AND version = 7
AND available_stock >= 1;
-- If rows_affected = 0, someone else got there first. Retry or show "sold out"
This is called optimistic locking because we’re optimistic that no one else is modifying the row. If they did, we detect it and retry. No heavy database locks needed.
The reserved stock pattern:
We don’t actually decrement stock when the user places an order. We reserve it first.
Available: 10, Reserved: 0
User places order:
Available: 9, Reserved: 1 (reserved for this order)
Payment succeeds:
Available: 9, Reserved: 0 (reserved → sold, stock stays at 9)
Payment fails:
Available: 10, Reserved: 0 (release the reservation)
Why? Because payment might take seconds to process. If we decrement immediately and the payment fails, we’d have to add it back. With reservation, we hold the item during payment processing, and only truly sell it when payment succeeds.
Reservation timeout:
What if the user goes through checkout but never completes payment? We set a timeout (e.g., 10 minutes). If the payment isn’t completed within that window, the reservation expires and the stock becomes available again. A background job cleans up expired reservations.
Deep Dive 2: Order Processing Pipeline
An order goes through multiple steps across multiple services. If any step fails, we need to handle it gracefully. This is where the saga pattern comes in.
The happy path:
1. User clicks "Place Order"
2. Order Service creates order (status: pending)
3. Inventory Service reserves stock
4. Payment Service charges the user
5. Inventory Service confirms (reserved → sold)
6. Order Service updates status to "paid"
7. Notification: "Your order is confirmed!"
8. Fulfillment Service picks + packs + ships
9. Order Service updates status to "shipped"
10. Delivery confirmed → status: "delivered"
What if payment fails at step 4?
We need to compensate — undo everything we did:
- Release the inventory reservation (step 3 rollback)
- Mark the order as “failed”
- Notify the user: “Payment failed, please try again”
The saga pattern:
In simple language, a saga is a sequence of steps where each step has a rollback action. If step N fails, we run the rollback for steps N-1, N-2, … down to step 1.
Step 1: Create order → Rollback: cancel order
Step 2: Reserve inventory → Rollback: release reservation
Step 3: Charge payment → Rollback: refund payment
Step 4: Confirm inventory → (no rollback needed — order is final)
We implement this using a message queue. Each service publishes events, and the next service in the chain listens for them.
Order Service → publishes "order_created"
→ Inventory Service hears it → reserves stock → publishes "stock_reserved"
→ Payment Service hears it → charges card → publishes "payment_charged"
→ Inventory Service hears it → confirms reservation
→ Order Service hears it → marks order as "paid"
If Payment fails → publishes "payment_failed"
→ Inventory Service hears it → releases reservation
→ Order Service hears it → marks order as "failed"
Idempotency is critical:
What if the payment message gets delivered twice? We’d charge the user twice. To prevent this, every payment request includes an idempotency key (usually the order_id). The payment provider checks: “Have I seen this key before? If yes, return the previous result instead of charging again.”
POST /charge
{ "amount": 149.97, "idempotency_key": "ord_999" }
First call: charges $149.97, returns success
Second call: returns the SAME success response without charging again
This is a non-negotiable requirement for any payment system.
Deep Dive 3: Product Search and Recommendations
Search with Elasticsearch:
The product catalog lives in PostgreSQL (source of truth), but search queries go to Elasticsearch. Why? Because Elasticsearch is built for full-text search with relevance ranking, fuzzy matching, autocomplete, and faceted filtering. PostgreSQL’s LIKE '%headphones%' would be painfully slow on 500M products.
How we keep Elasticsearch in sync:
Seller updates product in PostgreSQL
→ Change Data Capture (CDC) detects the change
→ Publishes event to Kafka
→ Elasticsearch consumer reads the event
→ Updates the search index
Eventual consistency: there's a 1-5 second delay. That's fine for search.
Search features:
GET /search?q=wireless+headphones&category=electronics&min_price=50&sort=relevance
Elasticsearch query:
{
"query": {
"bool": {
"must": { "multi_match": { "query": "wireless headphones", "fields": ["title^3", "description"] }},
"filter": [
{ "term": { "category": "electronics" }},
{ "range": { "price": { "gte": 50 }}}
]
}
},
"sort": ["_score"]
}
The title^3 means title matches are weighted 3x more than description matches. This is how we control relevance.
Autocomplete: Elasticsearch’s completion suggester gives us type-ahead suggestions as the user types. “wire…” → [“wireless headphones”, “wireless mouse”, “wireless charger”].
Product recommendations:
Two common approaches for an e-commerce platform:
-
“Customers who bought X also bought Y” — collaborative filtering. We look at purchase patterns across all users. If 70% of people who bought a phone case also bought a screen protector, we recommend screen protectors on the phone case page. Simple, effective, and doesn’t need ML.
-
“Based on your browsing history” — personalized recommendations. We track what the user browsed, what they added to cart, and what they bought. Then we find similar products using content-based similarity or a trained model.
For the interview, mentioning collaborative filtering and explaining how the “frequently bought together” feature works is usually enough.
Step 7: Scaling
Product Service (heaviest read load):
- Multi-layer caching: CDN (for product images) → Redis (for product data) → Read replicas (for DB)
- Cache product pages aggressively. Product data rarely changes.
- Cache invalidation: when a seller updates a product, invalidate the Redis cache and purge the CDN
- Shard the products table by
product_id
Search Service:
- Elasticsearch cluster with multiple shards per index
- Separate read and write nodes — heavy search load shouldn’t impact indexing
- Warm standby cluster for failover
Order Service:
- Shard orders by
user_id— a user’s orders live on the same shard - Event-driven architecture: use Kafka for inter-service communication
- The order table grows forever, but we only query recent orders. Partition by date.
Inventory Service (most critical for consistency):
- Single source of truth for stock levels — no caching layer for writes
- Read replicas for display purposes (“In Stock” badge on product pages)
- The actual stock check at checkout time must hit the primary database
- For flash sales: pre-warm inventory data in Redis, use Redis atomic operations (DECR) for ultra-fast stock reservation, then sync to DB
Handling sale events (Black Friday / Prime Day):
- Auto-scale all services 3-5 days before the event
- Pre-generate and cache popular product pages
- Use a queue-based checkout: if the system is overwhelmed, put users in a virtual queue instead of crashing
- Rate limit cart operations per user (prevent bots from hoarding inventory)
- Feature flags: disable non-essential features (reviews, recommendations) to save capacity for core shopping flow
Payment processing:
- Never process payments synchronously in the order flow — use async processing via message queue
- Retry with exponential backoff for payment failures
- Idempotency keys on every single payment call
- PCI compliance: never store raw credit card numbers. Use tokenized payment methods (Stripe tokens)
Database strategy:
- PostgreSQL for transactional data (orders, inventory, payments) — we need ACID guarantees
- Elasticsearch for search — eventually consistent with the source of truth
- Redis for caching (product data, sessions, carts) and real-time counters
- S3 for product images and static assets, fronted by a CDN
In simple language, an e-commerce platform is a collection of independent services — products, search, cart, orders, inventory, payments — each owning its own data. The hardest problems are inventory management (preventing overselling with optimistic locking and reserved stock), order processing (coordinating multiple services with the saga pattern), and search (keeping Elasticsearch in sync with the catalog). Everything is heavily cached because reads dominate writes by 100:1. And the payment system needs bulletproof idempotency because charging someone twice is the quickest way to lose trust. The beauty of this architecture is that each service scales independently — we can throw 10x more resources at the product service during a sale without touching the payment service.