Cluster Module - Node.js

Node.js runs JavaScript on a single thread. So if our server has 8 CPU cores, a plain Node process uses… 1. The other 7 sit idle. That’s wasteful for an HTTP server.

The cluster module fixes this by forking N copies of our process (one per core). All workers share the same port — the OS or the master process load-balances incoming connections across them.

In simple language: cluster is “run my server 8 times in parallel, and let them split traffic.”

Why not just spawn 8 servers manually?

We could run 8 Node processes on ports 3001-3008 and put nginx in front. That works. But cluster is simpler — one entry file, one port, automatic distribution. And workers can talk to the master via IPC if needed.

MASTER (PID 1000)
listens on :3000, forks workers

Worker
PID 1001
CPU 0

Worker
PID 1002
CPU 1

Worker
PID 1003
CPU 2

Worker
PID 1004
CPU 3

All 4 workers accept() on the SAME port :3000

How

The classic pattern: master forks, workers serve.

import cluster from 'node:cluster';
import os from 'node:os';
import http from 'node:http';

const numCPUs = os.cpus().length;

if (cluster.isPrimary) {
  console.log(`Master ${process.pid} forking ${numCPUs} workers`);
  for (let i = 0; i < numCPUs; i++) cluster.fork();

  cluster.on('exit', (worker, code) => {
    console.log(`Worker ${worker.process.pid} died (${code}), respawning`);
    cluster.fork();
  });
} else {
  http.createServer((req, res) => {
    res.end(`Handled by worker ${process.pid}\n`);
  }).listen(3000);
}

Hit :3000 repeatedly and you’ll see different PIDs in the response. That’s the OS round-robining.

Cluster vs Worker Threads — totally different things

People confuse these constantly. They’re not the same.

Aspect	Cluster	Worker Threads
Unit	Separate OS process	Thread inside one process
Memory	Each worker has own V8 heap	Can share memory via `SharedArrayBuffer`
Use case	Scale HTTP servers across cores	Offload CPU-heavy work (image resize, hashing)
Startup cost	Heavy (full process)	Lighter
Comms	IPC messages	`postMessage` + shared buffers

Rule of thumb: cluster = horizontal scaling for I/O-bound web servers. Worker threads = offload one CPU-heavy task without blocking the event loop.

Gotchas

State doesn’t replicate. Each worker has its own memory. In-memory caches, rate limiters, WebSocket connections — none are shared. Use Redis.
Sticky sessions. If we use WebSockets or session affinity, round-robin breaks. Need a layer-7 LB like nginx with ip_hash.
PM2 does this for us. In production, most people use PM2’s cluster mode instead of writing the fork code by hand. Same idea, less boilerplate.
Don’t fork more than os.cpus().length. More workers = more context switching, not more throughput.

When NOT to use cluster

If we’re behind Kubernetes or run multiple Docker containers anyway — skip cluster. One Node process per container, scale by adding containers. Simpler ops story.

Why not just spawn 8 servers manually?

How

Cluster vs Worker Threads — totally different things

Gotchas

When NOT to use cluster

References