Node.js runs JavaScript on a single thread. So if our server has 8 CPU cores, a plain Node process uses… 1. The other 7 sit idle. That’s wasteful for an HTTP server.
The cluster module fixes this by forking N copies of our process (one per core). All workers share the same port — the OS or the master process load-balances incoming connections across them.
In simple language: cluster is “run my server 8 times in parallel, and let them split traffic.”
Why not just spawn 8 servers manually?
We could run 8 Node processes on ports 3001-3008 and put nginx in front. That works. But cluster is simpler — one entry file, one port, automatic distribution. And workers can talk to the master via IPC if needed.
listens on :3000, forks workers
PID 1001
CPU 0
PID 1002
CPU 1
PID 1003
CPU 2
PID 1004
CPU 3
How
The classic pattern: master forks, workers serve.
import cluster from 'node:cluster';
import os from 'node:os';
import http from 'node:http';
const numCPUs = os.cpus().length;
if (cluster.isPrimary) {
console.log(`Master ${process.pid} forking ${numCPUs} workers`);
for (let i = 0; i < numCPUs; i++) cluster.fork();
cluster.on('exit', (worker, code) => {
console.log(`Worker ${worker.process.pid} died (${code}), respawning`);
cluster.fork();
});
} else {
http.createServer((req, res) => {
res.end(`Handled by worker ${process.pid}\n`);
}).listen(3000);
}
Hit :3000 repeatedly and you’ll see different PIDs in the response. That’s the OS round-robining.
Cluster vs Worker Threads — totally different things
People confuse these constantly. They’re not the same.
| Aspect | Cluster | Worker Threads |
|---|---|---|
| Unit | Separate OS process | Thread inside one process |
| Memory | Each worker has own V8 heap | Can share memory via SharedArrayBuffer |
| Use case | Scale HTTP servers across cores | Offload CPU-heavy work (image resize, hashing) |
| Startup cost | Heavy (full process) | Lighter |
| Comms | IPC messages | postMessage + shared buffers |
Rule of thumb: cluster = horizontal scaling for I/O-bound web servers. Worker threads = offload one CPU-heavy task without blocking the event loop.
Gotchas
- State doesn’t replicate. Each worker has its own memory. In-memory caches, rate limiters, WebSocket connections — none are shared. Use Redis.
- Sticky sessions. If we use WebSockets or session affinity, round-robin breaks. Need a layer-7 LB like nginx with
ip_hash. - PM2 does this for us. In production, most people use PM2’s cluster mode instead of writing the fork code by hand. Same idea, less boilerplate.
- Don’t fork more than
os.cpus().length. More workers = more context switching, not more throughput.
When NOT to use cluster
If we’re behind Kubernetes or run multiple Docker containers anyway — skip cluster. One Node process per container, scale by adding containers. Simpler ops story.