Cluster Module

advanced nodejs cluster scaling performance

Node.js runs JavaScript on a single thread. So if our server has 8 CPU cores, a plain Node process uses… 1. The other 7 sit idle. That’s wasteful for an HTTP server.

The cluster module fixes this by forking N copies of our process (one per core). All workers share the same port — the OS or the master process load-balances incoming connections across them.

In simple language: cluster is “run my server 8 times in parallel, and let them split traffic.”

Why not just spawn 8 servers manually?

We could run 8 Node processes on ports 3001-3008 and put nginx in front. That works. But cluster is simpler — one entry file, one port, automatic distribution. And workers can talk to the master via IPC if needed.

MASTER (PID 1000)
listens on :3000, forks workers
Worker
PID 1001
CPU 0
Worker
PID 1002
CPU 1
Worker
PID 1003
CPU 2
Worker
PID 1004
CPU 3
All 4 workers accept() on the SAME port :3000

How

The classic pattern: master forks, workers serve.

import cluster from 'node:cluster';
import os from 'node:os';
import http from 'node:http';

const numCPUs = os.cpus().length;

if (cluster.isPrimary) {
  console.log(`Master ${process.pid} forking ${numCPUs} workers`);
  for (let i = 0; i < numCPUs; i++) cluster.fork();

  cluster.on('exit', (worker, code) => {
    console.log(`Worker ${worker.process.pid} died (${code}), respawning`);
    cluster.fork();
  });
} else {
  http.createServer((req, res) => {
    res.end(`Handled by worker ${process.pid}\n`);
  }).listen(3000);
}

Hit :3000 repeatedly and you’ll see different PIDs in the response. That’s the OS round-robining.

Cluster vs Worker Threads — totally different things

People confuse these constantly. They’re not the same.

Aspect Cluster Worker Threads
UnitSeparate OS processThread inside one process
MemoryEach worker has own V8 heapCan share memory via SharedArrayBuffer
Use caseScale HTTP servers across coresOffload CPU-heavy work (image resize, hashing)
Startup costHeavy (full process)Lighter
CommsIPC messagespostMessage + shared buffers

Rule of thumb: cluster = horizontal scaling for I/O-bound web servers. Worker threads = offload one CPU-heavy task without blocking the event loop.

Gotchas

  • State doesn’t replicate. Each worker has its own memory. In-memory caches, rate limiters, WebSocket connections — none are shared. Use Redis.
  • Sticky sessions. If we use WebSockets or session affinity, round-robin breaks. Need a layer-7 LB like nginx with ip_hash.
  • PM2 does this for us. In production, most people use PM2’s cluster mode instead of writing the fork code by hand. Same idea, less boilerplate.
  • Don’t fork more than os.cpus().length. More workers = more context switching, not more throughput.

When NOT to use cluster

If we’re behind Kubernetes or run multiple Docker containers anyway — skip cluster. One Node process per container, scale by adding containers. Simpler ops story.