Worker Threads

advanced nodejs worker-threads parallelism performance

Node is single-threaded for JavaScript execution. The event loop, our handlers, every line of our code — all on one thread. That’s fine for I/O-bound work (the kernel does the waiting). It’s a disaster for CPU-bound work: a sync 2-second computation blocks every other in-flight request for 2 seconds. Worker Threads are Node’s answer.

In simple language: Worker Threads let us spawn a separate JS thread that runs alongside the main one. Real parallel execution, not just async I/O. We communicate via message passing, like a tiny isolated worker microservice that lives in our process.

What “CPU-bound” actually means

A request is CPU-bound when our process is doing math, not waiting on the network/disk. Examples:

  • Parsing a 50MB JSON or CSV
  • Resizing an image
  • Computing a SHA-256 hash over a big buffer
  • Compiling a regex against millions of strings
  • Running ML inference in pure JS

For I/O work (DB query, HTTP fetch, file read), Workers won’t help — Node’s event loop is already great at that.

A minimal worker

Workers live in their own file (or string). We message back and forth.

// main.js
import { Worker } from 'node:worker_threads';

function runHeavy(input) {
  return new Promise((resolve, reject) => {
    const worker = new Worker('./worker.js', { workerData: input });
    worker.on('message', resolve);
    worker.on('error', reject);
    worker.on('exit', (code) => {
      if (code !== 0) reject(new Error(`Worker exited ${code}`));
    });
  });
}

console.log(await runHeavy({ size: 10_000_000 }));
// worker.js
import { workerData, parentPort } from 'node:worker_threads';

// some CPU-heavy task — does NOT block main.js
let sum = 0;
for (let i = 0; i < workerData.size; i++) sum += Math.sqrt(i);

parentPort.postMessage({ sum });

While worker.js is grinding, the main thread keeps serving HTTP requests. That’s the point.

The architecture

Main thread
Event loop, HTTP, fast logic
postMessage →
← on('message')
Worker thread
Own V8 isolate, own event loop, own memory
CPU-heavy work
parentPort.postMessage
Separate memory. Communication via structured-clone message passing.

Each worker is essentially a fresh Node instance running inside the same process. Separate V8 heap, separate event loop, separate require cache.

postMessage — the message channel

postMessage uses the structured clone algorithm to serialize the data — same one browsers use for postMessage between windows. It can move plain objects, Buffers, Maps, Sets, typed arrays, even circular references. It cannot move functions, class instances with methods, or DOM-like objects.

parentPort.postMessage({
  result: bigBuffer,
  meta: { ts: Date.now() },
});

Bigger payload = more cloning cost. If we’re sending megabytes, consider transferList — Node moves the Buffer/ArrayBuffer without copying (the sender loses access to it).

parentPort.postMessage({ buf }, [buf.buffer]); // ownership transfer

SharedArrayBuffer — shared memory between threads

For the rare cases where workers need to read/write the same memory (image processing pipelines, multi-worker numerical compute), SharedArrayBuffer is the escape hatch.

// main.js
const sab = new SharedArrayBuffer(1024);
const view = new Int32Array(sab);
worker.postMessage(sab); // both threads now see the same bytes

Multiple threads writing the same memory is exactly the classic concurrency hazard — race conditions, torn reads, the works. Atomics (built-in) gives us atomic read/write/compare-and-swap. Use sparingly and only when message passing is genuinely too slow.

When to use Workers — and when not

Reach for Workers when:

  • The CPU work takes more than ~50ms — long enough to noticeably block the event loop.
  • The work is parallelizable and we want to use multiple cores.
  • We need real isolation (a sandbox for user-supplied code, for example).

Don’t reach for Workers when:

  • The work is I/O. Async I/O is already free of the event loop.
  • The work is tiny. Spawning a worker has startup cost (~10–50ms). For small jobs the overhead dwarfs the gain.
  • We just want more concurrency for HTTP requests. Use cluster (multiple Node processes behind the OS load balancer), or run multiple containers behind a reverse proxy. That’s the idiomatic Node scaling story.

The worker pool pattern

We almost never spawn a worker per request — startup cost kills us. Instead, we keep a pool of N workers (often os.availableParallelism()), and queue jobs to them. Libraries like piscina do this for us with a pool.run(task) API.

import Piscina from 'piscina';

const pool = new Piscina({ filename: new URL('./worker.js', import.meta.url) });

const result = await pool.run({ image: buf });

Pool stays warm, requests share workers, throughput goes way up.

Workers vs cluster vs child_process — quick contrast

  • Workers — same process, separate threads, message passing, shared memory possible. CPU-bound JS work.
  • cluster — multiple Node processes, OS-level load balancing on the same port. Scaling I/O-bound HTTP servers across cores.
  • child_process — spawning external commands (ffmpeg, git) or running other Node scripts as totally separate processes. Highest isolation, highest overhead.

Pick by what we’re trying to do — they’re not interchangeable.

The mental model

Workers turn Node from single-threaded to multi-threaded for CPU work. The cost is message passing between isolated heaps; the win is unblocking the main event loop. Use a pool, not one-off spawns. And remember: most Node bottlenecks are I/O, not CPU — measure before reaching for this hammer.