A message queue is a component that sits between two services and holds messages until the receiver is ready to process them. Think of it like a mailbox — the sender drops a letter in, and the receiver picks it up when they’re available. The sender doesn’t have to wait for the receiver to be home.
This is the foundation of asynchronous processing — doing work later instead of right now.
Why We Need Message Queues
Without a queue, services talk to each other synchronously. Service A calls Service B and waits. If B is slow or down, A is stuck.
With a queue:
- A drops a message in the queue and moves on immediately
- B picks it up whenever it’s ready
- If B crashes, the message stays in the queue — nothing is lost
This gives us decoupling, resilience, and scalability.
The Core Pattern
- Producer — The service that creates and sends messages
- Queue — The buffer that holds messages
- Consumer — The service that reads and processes messages
Point-to-Point vs Pub/Sub
Point-to-Point (Queue) — Each message is consumed by exactly one consumer. Once processed, the message is removed. Like a task queue where each task is done once.
Pub/Sub (Topic) — Each message can be consumed by multiple subscribers. The message stays available for all subscribers. Like a broadcast — everyone who’s listening gets the message.
| Pattern | Delivery | Use Case |
|---|---|---|
| Point-to-Point | One consumer per message | Task queues, job processing |
| Pub/Sub | All subscribers get every message | Notifications, event streaming, analytics |
When to Use Message Queues
Decoupling services — The order service doesn’t need to know about the email service. It just publishes “order placed” and moves on. The email service subscribes and sends the confirmation.
Handling traffic spikes — During a flash sale, we get 100x the normal orders. The queue absorbs the spike. Workers process orders at a steady rate.
Retry and error handling — If processing fails, the message goes back to the queue. It’ll be retried instead of lost. We can even have a dead letter queue (DLQ) for messages that fail repeatedly.
Heavy async work — Sending emails, generating reports, processing images, encoding videos — none of these need to happen during the user’s request. Drop a message in the queue and respond to the user immediately.
Popular Tools
Kafka
- Distributed event streaming platform
- Extremely high throughput (millions of messages/sec)
- Messages are persisted to disk and retained for days/weeks
- Consumers can replay messages from any point in time
- Great for: event sourcing, log aggregation, real-time analytics
RabbitMQ
- Traditional message broker
- Supports complex routing (exchanges, bindings)
- Messages are removed after consumption
- Lower throughput than Kafka but more flexible routing
- Great for: task queues, RPC patterns, complex routing
Amazon SQS
- Fully managed queue service from AWS
- No infrastructure to manage
- Two flavors: Standard (at-least-once, unordered) and FIFO (exactly-once, ordered)
- Great for: AWS-native apps, simple queuing needs
Quick Comparison
| Feature | Kafka | RabbitMQ | SQS |
|---|---|---|---|
| Throughput | Very high | Medium | Medium |
| Message retention | Days/weeks | Until consumed | Up to 14 days |
| Ordering | Per partition | Per queue | FIFO variant only |
| Replay | Yes | No | No |
| Managed option | Confluent Cloud | CloudAMQP | AWS native |
Message Queues in System Design
In interviews, bring up message queues whenever we have:
- Work that doesn’t need to happen immediately
- Services that should be independent of each other
- Traffic that’s bursty or unpredictable
- Operations that might fail and need retries
A common pattern in system design interviews:
User uploads video → API Server → Queue → Video Processing Workers → Store in S3
↓
Return "processing..." to user immediately
In simple language, a message queue lets us say “I’ll deal with this later” instead of doing everything right now. It makes our systems more resilient, more scalable, and better at handling the unpredictable real world.