Design a Chat System (WhatsApp)

advanced 4-7 YOE chat messaging websocket system-design real-time

We’re designing a real-time chat system like WhatsApp or Messenger. This is an advanced question because it touches on real-time communication, message ordering, delivery guarantees, group chats, and online presence — all at massive scale.

The core challenge: how do we deliver a message from one person to another in real time, reliably, and at the scale of billions of messages per day?

Step 1: Requirements

Functional Requirements

  • One-on-one messaging between users
  • Group chats (up to 500 members)
  • Message delivery status: sent, delivered, read
  • Online/offline presence indicators
  • Push notifications for offline users
  • Media sharing (images, videos, documents)
  • Message history and persistence

Non-Functional Requirements

  • Real-time — messages should arrive in under 500ms
  • Reliable delivery — no messages should be lost, even if the recipient is offline
  • Message ordering — messages in a conversation should appear in the correct order
  • High availability — chat is a core feature, downtime is unacceptable
  • Low latency — the system should feel instant
  • Scale — support 2B users, 60B messages/day

Step 2: Estimation

Assumptions:

  • 2B total users, 500M daily active users
  • Each active user sends 40 messages/day on average
  • Average message size: 100 bytes (text), 500 KB (media — 10% of messages include media)

QPS:

Messages/day = 500M × 40 = 20B messages/day
Write QPS    = 20B / 86,400 ≈ ~230,000 messages/sec
Peak QPS     = ~500,000 messages/sec

Storage:

Text messages:  20B × 100 bytes = 2 TB/day
Media messages: 20B × 0.10 × 500 KB = 1 PB/day (media stored in object storage)
Text per year:  ~730 TB

Connections:

500M DAU = 500M concurrent WebSocket connections (at peak)
Each connection ≈ 10 KB memory on the server
500M × 10 KB = 5 TB of memory across all chat servers

That’s a lot of connections. We’ll need thousands of chat servers.

Step 3: High-Level Design

Chat System — High-Level Architecture
User A
User B
│ WebSocket
│ WebSocket
Chat Server 1 Chat Server 2 Chat Server N
Message Queue (Kafka)
Message DB
(Cassandra)
User Service
(profiles, auth)
Push Service
(APNs, FCM)
Media Service
(S3 + CDN)
1:1 flow: A → Chat Server 1 → Kafka → Chat Server 2 → B (if online) or Push Service (if offline)
Persistence: Every message gets written to the Message DB via Kafka consumers

Why WebSocket?

HTTP is request-response. The client has to ask the server “got any new messages?” over and over. That’s called polling, and it’s wasteful.

WebSocket gives us a persistent, two-way connection. The server can push a message to the client the instant it arrives. No polling, no wasted requests. This is how every modern chat app works.

The flow for a 1:1 message:

  1. User A sends a message through their WebSocket connection to Chat Server 1
  2. Chat Server 1 looks up which chat server User B is connected to (using a connection registry in Redis)
  3. If User B is online → route the message to that chat server → push to B’s WebSocket
  4. If User B is offline → send a push notification via APNs/FCM
  5. Either way, the message gets persisted to the database

Step 4: API Design

For WebSocket messages, we use a JSON-based protocol:

// Send a message
{
  "type": "send_message",
  "conversation_id": "conv_abc123",
  "content": "Hey, how are you?",
  "content_type": "text",
  "client_msg_id": "uuid-from-client"
}

// Server acknowledgment
{
  "type": "message_ack",
  "client_msg_id": "uuid-from-client",
  "server_msg_id": "msg_789",
  "status": "sent",
  "timestamp": 1735689600
}

// Receive a message (pushed by server)
{
  "type": "new_message",
  "conversation_id": "conv_abc123",
  "server_msg_id": "msg_789",
  "sender_id": "user_42",
  "content": "Hey, how are you?",
  "content_type": "text",
  "timestamp": 1735689600
}

// Delivery receipt
{
  "type": "message_status",
  "server_msg_id": "msg_789",
  "status": "delivered"   // or "read"
}

For REST endpoints (non-real-time operations):

GET  /api/v1/conversations                    — list user's conversations
GET  /api/v1/conversations/{id}/messages      — message history (paginated)
POST /api/v1/conversations                    — create a group chat
POST /api/v1/media/upload                     — get pre-signed URL for media upload

Step 5: Data Model

We need a database that handles massive write throughput and scales horizontally. Cassandra is a great fit for the messages table — it’s designed for write-heavy workloads and scales linearly.

-- Users table (PostgreSQL — relational, small)
CREATE TABLE users (
    user_id     BIGINT PRIMARY KEY,
    username    VARCHAR(50) UNIQUE,
    display_name VARCHAR(100),
    avatar_url  TEXT,
    last_seen   TIMESTAMP,
    created_at  TIMESTAMP
);

-- Conversations table (PostgreSQL)
CREATE TABLE conversations (
    conversation_id  BIGINT PRIMARY KEY,
    type            VARCHAR(10),          -- 'direct' or 'group'
    name            VARCHAR(100),         -- group name (null for direct)
    created_at      TIMESTAMP
);

-- Conversation participants (PostgreSQL)
CREATE TABLE participants (
    conversation_id  BIGINT,
    user_id         BIGINT,
    role            VARCHAR(10) DEFAULT 'member',  -- 'admin', 'member'
    joined_at       TIMESTAMP,
    PRIMARY KEY (conversation_id, user_id)
);

-- Messages table (Cassandra — write-heavy, time-series)
-- Partition key: conversation_id
-- Clustering key: message_id (TimeUUID for ordering)
CREATE TABLE messages (
    conversation_id  BIGINT,
    message_id      TIMEUUID,
    sender_id       BIGINT,
    content         TEXT,
    content_type    VARCHAR(10),    -- 'text', 'image', 'video'
    media_url       TEXT,
    created_at      TIMESTAMP,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id ASC);

Why Cassandra for messages? Each conversation is a partition. Reading a conversation’s messages is a single partition read — super fast. Writes are append-only. And Cassandra scales horizontally by adding more nodes.

Connection registry (Redis):

# Which chat server is each user connected to?
user_connection:{user_id} → "chat-server-42:ws-connection-id"
TTL: 300 seconds (refreshed by heartbeat)

Step 6: Deep Dives

Deep Dive 1: Message Delivery and Ordering

Delivering messages reliably is harder than it sounds. What if User B’s phone is offline? What if they switch networks mid-conversation? What if two messages arrive out of order?

Message delivery flow:

Sender                    Server                    Receiver
  │                         │                          │
  │── Send message ────────>│                          │
  │<── ACK (sent) ──────────│                          │
  │                         │── Push message ─────────>│
  │                         │<── ACK (delivered) ──────│
  │<── Status: delivered ───│                          │
  │                         │       (user opens chat)  │
  │<── Status: read ────────│<── Read receipt ─────────│

Three statuses: sent (server received it), delivered (recipient’s device got it), read (recipient opened the conversation).

Ordering: Messages within a single conversation must appear in order. We use a per-conversation sequence number for this. Every message in a conversation gets an incrementing sequence number assigned by the server.

Conversation conv_abc123:
  msg_1: seq=1, "Hey"           (from A, 1:00:00)
  msg_2: seq=2, "What's up?"    (from A, 1:00:01)
  msg_3: seq=3, "Not much, you?" (from B, 1:00:05)

The client uses these sequence numbers to display messages in order. If a message arrives with seq=5 but the client only has up to seq=3, it knows seq=4 is missing and requests it from the server.

Handling offline users: When User B is offline, the server stores the message in the database and sends a push notification. When User B comes back online and reconnects via WebSocket, the server sends all undelivered messages (everything since B’s last received sequence number).

Deep Dive 2: Group Chat Fan-Out

In a 1:1 chat, we deliver the message to one person. In a group chat with 200 people, we have to deliver it to 199 people. How?

Option A: Fan-out on write (push model)

When someone sends a message to a group, the server immediately pushes it to every online member’s WebSocket connection and queues push notifications for offline members.

Alice sends "Hello everyone" to group (200 members)
→ Server looks up all 200 participants
→ For each online member: push via WebSocket
→ For each offline member: send push notification
→ Write message once to the messages table

Pros: Receivers get the message instantly. Reading is simple — just fetch from the connection. Cons: Sending a message is expensive for large groups. If 200 people are online, that’s 200 WebSocket pushes per message.

Option B: Fan-out on read (pull model)

The server just stores the message. When each member opens the group chat, they pull new messages.

Pros: Sending is fast — just one write. Cons: Opening a group chat requires a DB read. For active groups, this adds latency.

Our approach: Push for small/medium groups (up to ~500 members). WhatsApp caps groups at 1024 members for exactly this reason. For very large groups (like Slack channels with 10K+ people), we’d switch to a pull model with smart caching.

The message queue (Kafka) helps here. When a message is sent to a group, it’s published to a topic partitioned by conversation. Consumer workers read from the topic and handle the fan-out — looking up members, pushing to WebSockets, and sending push notifications.

Deep Dive 3: Online Presence and Typing Indicators

How does the app show that someone is “online” or “typing…”?

Online/offline status:

Every connected client sends a heartbeat to the server every 30 seconds. The server updates the user’s status in Redis:

presence:{user_id} → { status: "online", last_seen: 1735689600 }
TTL: 60 seconds (auto-expires if no heartbeat)

When the TTL expires without a heartbeat, the user is considered offline. We update last_seen and publish an “offline” event to their contacts.

But we don’t want to notify every contact every time someone goes online/offline. That’s too expensive at scale. Instead:

  • We only send presence updates to people who have the chat open (actively viewing the conversation list or the specific chat)
  • Presence is best-effort — it’s okay if it’s a few seconds stale

Typing indicators:

When a user starts typing, the client sends a typing_start event. The server forwards it to the other participants in the conversation. After 3 seconds of no keystrokes, the client sends typing_stop.

{ "type": "typing", "conversation_id": "conv_abc", "user_id": "user_42", "status": "typing" }

Typing indicators are fire-and-forget. If one gets lost, nothing bad happens — the “typing…” label just disappears sooner. No persistence needed.

Step 7: Scaling

Chat server scaling:

  • Each chat server handles ~100K concurrent WebSocket connections
  • 500M concurrent users → ~5,000 chat servers
  • Use consistent hashing to assign users to chat servers
  • The connection registry in Redis lets any chat server route a message to the right destination

Database scaling:

  • Messages in Cassandra — partition by conversation_id for locality
  • User data in PostgreSQL — read replicas for lookups
  • Hot conversations (active group chats) can be cached in Redis

Message queue (Kafka) scaling:

  • Partition by conversation_id — all messages for a conversation go to the same partition (preserves ordering)
  • Add more partitions and consumers as throughput grows

Media handling:

  • Media files go to object storage (S3) via pre-signed URLs
  • Thumbnails generated asynchronously by a media processing service
  • CDN in front of object storage for fast delivery

Multi-region deployment:

  • Chat servers in multiple regions for low latency
  • Messages replicated across regions asynchronously
  • User connects to the nearest region (GeoDNS)
  • Cross-region messages: Server in Region A publishes to Kafka → replicated to Region B → delivered locally

Push notifications at scale:

  • Batch push notifications to APNs/FCM (they support batching)
  • Separate push notification workers so they don’t block message delivery
  • Rate limit notifications per user (nobody wants 100 notifications from an active group)

In simple language, a chat system is a network of WebSocket connections with a message queue in the middle. The client connects to a chat server. The server knows where every other user is connected (Redis registry). Messages flow through Kafka for reliability and ordering. Offline messages get queued and delivered on reconnect. The hardest parts are keeping messages in order, handling group fan-out efficiently, and managing millions of persistent connections. But the core pattern is surprisingly straightforward: accept message, find recipient, deliver.