Node.js

All 29 notes on one page

Fundamentals

1

What is Node.js

beginner nodejs v8 libuv runtime

Node.js is a JavaScript runtime that lets us run JS outside the browser. Before Node, JavaScript only ran inside browsers. Ryan Dahl built Node in 2009 so we could use the same language on the server too.

In simple language — Node.js takes the V8 engine out of Chrome, glues it to a C library called libuv, and gives us file system access, networking, child processes, and everything else a server needs.

The two main pieces

Node is essentially two things stitched together:

  • V8 — Google’s JavaScript engine (the same one Chrome uses). It compiles JS to native machine code. This is what actually runs our code.
  • libuv — a C library that handles non-blocking I/O. It gives us the event loop, the thread pool, and async file/network operations.

Everything else (the fs, http, crypto modules, etc.) is a thin layer on top of these two.

Node.js Architecture
Your code
app.js, server.js, ...
Node.js Core (JS)
fs, http, stream, crypto, path...
V8
runs JS, GC, JIT
libuv
event loop, thread pool, I/O

Non-blocking I/O is the big idea

Most server work is waiting — for a database, for a file, for an HTTP response. Traditional servers spin up a thread per request. Node uses one thread and an event loop. When we call fs.readFile(), Node hands the work off to libuv, returns immediately, and runs other code. When the file is ready, our callback runs.

The only thread that runs our JavaScript is single. But the I/O happens in parallel under the hood. That’s why Node feels fast for I/O-heavy workloads.

const fs = require("node:fs");

console.log("1. starting");

fs.readFile("./data.json", "utf8", (err, data) => {
  console.log("3. file read done");
});

console.log("2. continuing without waiting");
// Output order: 1, 2, 3

What we actually use Node for

  • HTTP / REST APIs — Express, Fastify, NestJS
  • Real-time apps — WebSockets, chat, multiplayer games (great fit because of the event loop)
  • CLI tools — npm itself, ESLint, Prettier, Vite are all Node CLIs
  • Build tooling — bundlers, transpilers, test runners
  • Microservices — small, fast HTTP services
  • Streaming pipelines — log processing, file transforms

Where Node is NOT a great fit

Node has one main thread for JS. CPU-heavy work (video encoding, ML inference, big math loops) blocks that thread and stalls everything. For CPU-bound work, we’d reach for Go, Rust, or Python with native libs — or offload to worker threads within Node.

Quick history

  • 2009 — Ryan Dahl releases Node
  • 2010 — npm launches
  • 2015 — io.js fork merges back, Node 4.0 released, Node Foundation forms
  • 2019 — Node Foundation + JS Foundation merge into OpenJS Foundation
  • 2020+ — ES Modules become stable, node: protocol, built-in fetch, --test runner

Today Node powers Netflix, LinkedIn, PayPal, Uber’s API gateway, and basically every company’s tooling layer.


2

Event Loop Deep Dive

intermediate event-loop libuv async microtasks nexttick

The event loop is THE most asked Node.js interview question. In simple language — it’s the mechanism that lets a single-threaded runtime handle thousands of concurrent operations without blocking.

When we call setTimeout, fs.readFile, or a network request, Node hands the work to libuv and our callback gets queued. The event loop is the orchestrator that picks the right callback to run next.

The 6 phases

The event loop runs in a loop (obviously), and each tick of that loop goes through 6 phases in order. Each phase has its own callback queue.

One tick of the event loop
1timers — setTimeout, setInterval callbacks whose time is up
2pending callbacks — some system errors (e.g. TCP ECONNREFUSED)
3idle, prepare — internal only
4poll — I/O callbacks (fs, net, ...). Waits here if nothing else to do.
5check — setImmediate callbacks
6close callbacks — socket.on('close'), etc.
↻ loops back to phase 1

The four we actually care about in interviews are timers, poll, check, and close.

Microtasks run BETWEEN phases

Here’s the key bit most people miss. Microtasks aren’t a phase. They run between every phase (and after every individual callback). There are two microtask queues:

  • process.nextTick queue — runs first, drained completely
  • Promise (then/catch/finally) queue — runs second, drained completely

So the real flow is: run a callback → drain nextTick queue → drain Promise queue → next callback. This is why process.nextTick can starve the event loop if we recurse on it.

setTimeout(() => console.log("timeout"), 0);
setImmediate(() => console.log("immediate"));

Promise.resolve().then(() => console.log("promise"));
process.nextTick(() => console.log("nextTick"));

console.log("sync");

// Output:
// sync
// nextTick
// promise
// timeout       (or immediate first — depends on context)
// immediate

setImmediate vs setTimeout(fn, 0)

Classic interview trap. Outside of an I/O callback, the order is non-deterministic — it depends on how fast Node enters the loop. Inside an I/O callback, setImmediate ALWAYS runs first (because we’re already past the timers phase, heading to check).

const fs = require("node:fs");

fs.readFile(__filename, () => {
  setTimeout(() => console.log("timeout"), 0);
  setImmediate(() => console.log("immediate"));
});
// Always: immediate, timeout

process.nextTick — use carefully

process.nextTick fires before any I/O or timer, after the current operation. It’s how we defer something to “right after this function but before anything else”.

Think of it like — “finish this stack, then immediately do this, before going back to the event loop”.

Recursive nextTick calls can block the loop forever. Recursive Promise resolutions have the same risk in newer Node (they share priority since Node 11+).

Why Node feels concurrent

While our JS is sync and single-threaded, libuv runs I/O on a 4-thread worker pool by default (configurable via UV_THREADPOOL_SIZE). File system ops, DNS lookups, and crypto use this pool. Network I/O uses the OS’s epoll/kqueue directly — no thread needed.

The thread pool finishes a job → pushes the callback into the right phase queue → event loop picks it up on the next tick.

Practical takeaway

  • Don’t do heavy CPU work on the main thread — it blocks every phase.
  • process.nextTick for “must run before any I/O”.
  • setImmediate for “run after current poll phase”.
  • queueMicrotask is the standard, cross-platform way to schedule a microtask (uses the Promise queue).
  • If our loop is lagging, check for sync code, big JSON.parse, or sync fs calls.

3

Non-blocking I/O

intermediate libuv async thread-pool io

Non-blocking I/O is the whole reason Node exists. In simple language — when our code asks for something slow (a file, a network call), Node doesn’t sit and wait. It hands the work off and continues running other code. When the slow thing finishes, our callback gets queued up.

Blocking vs non-blocking

A blocking call freezes the thread until it returns. A non-blocking call returns immediately and notifies us later.

const fs = require("node:fs");

// Blocking — stops everything until done
const data = fs.readFileSync("./big.json", "utf8");
console.log(data);

// Non-blocking — returns instantly, callback runs later
fs.readFile("./big.json", "utf8", (err, data) => {
  console.log(data);
});
console.log("this runs FIRST");

If we use the sync version in an HTTP handler, every request reading that file waits in line. That’s bad. The async version lets Node serve thousands of requests interleaved.

How libuv pulls this off

libuv (the C library Node uses for non-blocking I/O) uses two strategies depending on the operation:

  1. OS-level async APIs — for network I/O (sockets), libuv uses epoll on Linux, kqueue on macOS/BSD, and IOCP on Windows. The OS itself notifies libuv when a socket is ready. Zero extra threads needed.
  2. Thread pool — for things the OS doesn’t expose as async (file system on most platforms, DNS lookups, crypto, zlib), libuv uses a pool of worker threads. Default 4 threads, configurable via UV_THREADPOOL_SIZE (max 1024).
Where I/O actually happens
OS async (no thread)
TCP / UDP sockets, HTTP, pipes
epoll / kqueue / IOCP
Thread pool (4 by default)
fs, dns.lookup, crypto.pbkdf2, zlib
UV_THREADPOOL_SIZE

The flow of an async call

Take fs.readFile:

  1. JS calls fs.readFile(path, cb).
  2. Node passes the work to libuv.
  3. libuv picks a worker thread, that thread calls read() syscalls.
  4. Meanwhile, our main thread is free — it runs other JS, handles requests, whatever.
  5. The worker thread finishes, hands the result back to libuv.
  6. libuv queues our callback in the poll phase of the event loop.
  7. Event loop reaches poll phase → runs our callback.

Why this makes Node fast (for I/O)

A traditional thread-per-request server (think old Apache) needs ~1MB of stack per thread. 10,000 connections = 10GB of RAM just for stacks. Node holds 10,000 connections on one thread with maybe a few hundred MB of memory. The bottleneck shifts from threads to actual work.

But — and this is important — Node is fast for I/O-bound work. For CPU-bound work (image processing, JSON parsing big payloads, cryptography in a tight loop), Node is no faster than anything else, and worse, the heavy code blocks all other requests.

Tuning the thread pool

If we’re doing heavy crypto or lots of fs work, the default 4 threads can become a bottleneck. Bump it:

UV_THREADPOOL_SIZE=16 node server.js

Don’t set this absurdly high — past your CPU core count it just causes context switching.

Worker threads — for CPU work

For genuinely CPU-heavy code, Node has the worker_threads module. These are real OS threads with their own V8 instance. We send messages between them. Use these for things like image resizing, parsing huge files, or running ML inference.

const { Worker } = require("node:worker_threads");
const w = new Worker("./heavy-task.js");
w.on("message", (result) => console.log(result));
w.postMessage({ payload: "..." });

The cardinal rule

Never block the event loop. No JSON.parse on a 50MB string, no fs.readFileSync in a request handler, no while loop computing primes. If we block the loop, every connection waits.


4

REPL & Node CLI

beginner repl cli debugging node-flags

The Node CLI is more than just node app.js. It’s an interactive playground, a debugger entry point, and a quick scripting tool. Knowing the useful flags saves us a lot of time.

REPL — Read Eval Print Loop

Type node with no arguments and we get an interactive JS shell. Same engine, same APIs as Node, but live.

$ node
Welcome to Node.js v20.11.0.
> 1 + 1
2
> const fs = require("node:fs")
undefined
> fs.readdirSync(".")
[ 'package.json', 'index.js', 'README.md' ]
> .exit

Handy for trying out an API, checking date math, or testing a regex without making a file.

REPL dot commands

Inside the REPL, commands starting with . are special:

  • .help — list all commands
  • .editor — multi-line editor mode (Ctrl+D to finish)
  • .load file.js — evaluate a file’s contents into the REPL
  • .save out.js — save the session to a file
  • .break / .clear — abandon current multi-line input
  • .exit (or Ctrl+D twice) — quit

Useful REPL tricks

  • _ holds the result of the last expression. _error holds the last thrown error.
  • Tab completion works on variables and properties.
  • Top-level await works — no need to wrap in an async function.
> await fetch("https://api.github.com")
> _.status
200

Common CLI flags

-e
execute a string of JS and exit
-p
like -e but prints the result
--watch
auto-restart on file change (Node 18.11+)
--inspect
open Chrome DevTools debugger on port 9229
--inspect-brk
same, but pause on first line
--env-file
load a .env file (Node 20.6+)
--test
run the built-in test runner
--require / -r
preload a module before script runs
--experimental-*
opt into unstable features (loaders, vm modules, ...)

-e and -p — quick one-liners

-e evals a string. -p does the same but prints the result. Great for tiny shell utilities.

# Get a UUID without installing anything
node -p "crypto.randomUUID()"
# d8e7c2a0-...

# Quickly check Node version programmatically
node -p "process.version"

# Read JSON from stdin and pretty-print
cat data.json | node -e "let s='';process.stdin.on('data',d=>s+=d).on('end',()=>console.log(JSON.stringify(JSON.parse(s),null,2)))"

—watch — built-in nodemon

Since Node 18.11, we don’t need nodemon for most cases. --watch restarts our process when watched files change.

node --watch server.js
# Watching for file changes...

We can also pass --watch-path=./src to scope it.

—inspect — debugging

Adds a debugger that DevTools can attach to. Open chrome://inspect in Chrome and we see our Node process. Or use VS Code’s “Attach to Node” config.

node --inspect server.js
# Debugger listening on ws://127.0.0.1:9229/...

node --inspect-brk server.js
# Same, but the program pauses on line 1 waiting for us to attach

—env-file — built-in dotenv

Node 20.6+ ships with a built-in .env loader. We no longer need the dotenv package for simple cases.

node --env-file=.env server.js

Inside our code, the variables show up on process.env like normal.

process.argv — reading CLI args

When we write our own CLIs, args come in via process.argv. The first two entries are the node binary and the script path.

// node greet.js Manish
console.log(process.argv);
// [ '/usr/bin/node', '/path/to/greet.js', 'Manish' ]

const name = process.argv[2];
console.log(`Hello, ${name}`);

For anything more than one arg, reach for node:util’s parseArgs (Node 18+) or the commander / yargs packages.

const { parseArgs } = require("node:util");

const { values } = parseArgs({
  options: {
    port: { type: "string", short: "p", default: "3000" },
    dev: { type: "boolean" },
  },
});

console.log(values); // { port: '8080', dev: true }

Modules & Package Management

5

CommonJS vs ES Modules

intermediate modules commonjs esm import require

Node has two module systems. CommonJS (CJS) is the original — require() and module.exports. ES Modules (ESM) is the standard from the JS spec — import and export. Knowing how they differ matters because mixing them up causes very real production bugs.

The two systems at a glance

CommonJS (CJS)
Default extension: .js (or .cjs)
Synchronous loading
require() / module.exports
__dirname, __filename available
No top-level await
Loaded by reading + wrapping in a function
ES Modules (ESM)
Default extension: .mjs (or .js with "type":"module")
Asynchronous loading
import / export
import.meta.url instead of __dirname
Top-level await works
Static graph — imports must be at top

How Node decides which system a file is

The rules in order:

  1. File ends in .cjs → CommonJS.
  2. File ends in .mjs → ESM.
  3. File ends in .js → look at the nearest package.json:
    • "type": "module" → ESM
    • "type": "commonjs" or no type field → CommonJS
// package.json
{
  "type": "module"
}

With that, every .js file in the package is treated as ESM. If we still need a CJS file inside, we use .cjs.

Syntax side by side

// CommonJS
const fs = require("node:fs");
const { readFile } = require("node:fs/promises");

function greet(name) {
  return `Hello, ${name}`;
}

module.exports = { greet };
// or: module.exports.greet = greet;
// ES Modules
import fs from "node:fs";
import { readFile } from "node:fs/promises";

export function greet(name) {
  return `Hello, ${name}`;
}

// or default export:
// export default greet;

The gotchas

1. __dirname doesn’t exist in ESM

In CJS, __dirname and __filename are free variables. In ESM, they’re gone. We use import.meta.url:

import { fileURLToPath } from "node:url";
import { dirname } from "node:path";

const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);

Node 20.11+ added import.meta.dirname and import.meta.filename so we can skip the boilerplate.

2. ESM imports MUST include the extension

CJS lets us write require("./utils") and it tries .js, .json, .node. ESM is strict — we have to write ./utils.js. (Node 22+ has a --experimental-default-type flag and there’s ongoing work to relax this for node_modules.)

3. You can require() ESM (sometimes)

Until recently, require() of an ESM file threw ERR_REQUIRE_ESM. Node 22+ supports require() of synchronous ESM (no top-level await) under a flag, and Node 23+ enables it by default. Older versions force us to use dynamic import():

// In a CJS file, loading an ESM module:
async function load() {
  const mod = await import("./esm-module.mjs");
  mod.doStuff();
}

4. Named exports from CJS into ESM

Importing a CJS module from ESM gives us the whole module.exports as the default. Node tries to detect named exports too, but if it can’t (e.g., they’re set dynamically), we have to destructure manually:

// CJS package
import pkg from "lodash";
const { debounce } = pkg;
// or, if Node detects named exports:
import { debounce } from "lodash";

5. JSON imports need an attribute

import data from "./data.json" with { type: "json" };

Dual packages — supporting both

Library authors often ship both. The package.json exports field is how we tell Node which file to use:

{
  "name": "my-lib",
  "type": "module",
  "main": "./dist/index.cjs",
  "exports": {
    ".": {
      "import": "./dist/index.mjs",
      "require": "./dist/index.cjs"
    }
  }
}

This is called conditional exports. The import key wins for ESM consumers, require wins for CJS.

When to use which

For new code in 2026, prefer ESM. It’s the standard, bundlers prefer it, top-level await is great, tree-shaking works better. The only reason to stay on CJS is a large existing codebase or a hot-loaded plugin system that needs sync require.


6

require Resolution Algorithm

intermediate modules require resolution node_modules

When we write require("express"), how does Node actually find that file? The resolution algorithm is well-defined and worth knowing — it explains a lot of bugs (“why is it picking up the wrong version?”, “why does my monorepo break?”).

The big picture

In simple language, Node walks through a checklist for require(X):

  1. Is X a core module (fs, http, path, …)? Use that.
  2. Does X start with ./, /, or ../? Treat it as a file or directory path.
  3. Otherwise, walk node_modules up the directory tree until found.

If none of these work, we get the famous Error: Cannot find module 'X'.

require("X") flowchart
Step 1: Is X a core module like "fs" or "node:http"? → return it.
↓ no
Step 2: Does X start with "./", "/", "../"? → resolve as file/dir path.
↓ no
Step 3: Walk up node_modules folders from current dir to root.
↓ found?
Load it. Otherwise throw MODULE_NOT_FOUND.

Core modules win first

If X matches a built-in name (fs, path, crypto, http, etc.), Node returns the built-in regardless of anything in node_modules. Since Node 16 we can prefix with node: to be explicit and immune to userland shadowing:

const fs = require("node:fs"); // always the built-in

Relative and absolute paths — LOAD_AS_FILE then LOAD_AS_DIRECTORY

For require("./utils"), Node tries in this order:

./utils                  exact path (if a file)
./utils.js
./utils.json
./utils.node             (compiled C++ addon)
./utils/package.json     read "main" field
./utils/index.js
./utils/index.json
./utils/index.node

This is why require("./utils") works when the file is utils.js — Node appends extensions for us.

node_modules tree walk

For a bare specifier like require("express"), Node walks up the directory tree, checking for node_modules/express at each level until it hits the filesystem root:

/Users/me/project/api/src/routes/users.js  ← calling from here

Checks:
  /Users/me/project/api/src/routes/node_modules/express
  /Users/me/project/api/src/node_modules/express
  /Users/me/project/api/node_modules/express
  /Users/me/project/node_modules/express        ← found! use this
  /Users/me/node_modules/express
  /Users/node_modules/express
  /node_modules/express

This is why monorepos work — packages at the root resolve from any subfolder. And it’s why a deeply nested duplicate of a package can shadow the root one.

Loading a node_modules package

Once Node finds node_modules/express/, it needs to pick an entry file. It reads package.json:

  1. If exports field exists → use that (conditional exports, see below).
  2. Else if main field exists → use that file.
  3. Else fall back to index.js.
{
  "name": "express",
  "main": "./index.js"
}

The exports field changes things

Modern packages use exports, which is strict. It blocks access to internal files and supports conditions (import vs require, node vs browser).

{
  "exports": {
    ".": {
      "import": "./dist/esm/index.mjs",
      "require": "./dist/cjs/index.js"
    },
    "./utils": "./dist/utils.js"
  }
}

With exports, require("my-pkg/internal/private") throws — even if the file exists. This is module encapsulation.

Caching — modules load once

Node caches the resolved module by its absolute path. The second require("express") returns the same exports object as the first. The cache lives at require.cache.

console.log(require.cache);
// { '/abs/path/index.js': Module { ... } }

delete require.cache[require.resolve("./config")]; // force reload
const fresh = require("./config");

This is why a module’s top-level code runs once per process — not once per import.

Inspecting resolution

require.resolve() returns the resolved path without loading the module. Super useful when debugging “wait, which copy is it picking up?”:

console.log(require.resolve("express"));
// /Users/me/project/node_modules/express/index.js

To run with NODE_PATH extra search dirs, set the env var (rare, mostly used for global tools):

NODE_PATH=/usr/local/lib/node_modules node script.js

Common gotchas

  • Wrong version in a monorepo — a workspace’s own node_modules shadows a hoisted version. Run require.resolve to confirm.
  • Case sensitivity — works on macOS (case-insensitive FS), breaks on Linux. Always match casing exactly.
  • Symlinks — by default Node resolves to the real path. Use --preserve-symlinks for some monorepo setups.

7

package.json Fields

beginner package-json npm dependencies exports

package.json is the heart of any Node project. In simple language — it tells Node and npm everything about our project: name, version, what to run, what to install, and how others should import from us.

A real-world example:

{
  "name": "khoj",
  "version": "1.2.0",
  "description": "Personal job scraper",
  "type": "module",
  "main": "./dist/index.js",
  "exports": {
    ".": "./dist/index.js",
    "./utils": "./dist/utils.js"
  },
  "scripts": {
    "dev": "node --watch src/index.js",
    "test": "node --test",
    "build": "tsc"
  },
  "dependencies": {
    "axios": "^1.6.0",
    "pg": "^8.11.0"
  },
  "devDependencies": {
    "typescript": "^5.3.0"
  },
  "engines": {
    "node": ">=20"
  }
}

name and version

name is how npm and require find our package. Lowercase, no spaces, optionally scoped (@scope/name).

version follows semverMAJOR.MINOR.PATCH. Increment major for breaking changes, minor for new features, patch for fixes.

type — CJS or ESM?

  • "type": "module" → all .js files are ESM
  • "type": "commonjs" (or absent) → all .js files are CJS

This setting controls how Node loads our files. See the CommonJS vs ESM note for details.

main, module, exports — the entry points

These three control what consumers get when they import or require our package.

  • main — the classic entry. Used by CJS require() and as the fallback.
  • module — bundler-only field (Webpack, Rollup). Points to an ESM build. Node ignores this.
  • exports — modern, strict, conditional. Beats main if present.
{
  "main": "./dist/index.cjs",
  "module": "./dist/index.mjs",
  "exports": {
    ".": {
      "types": "./dist/index.d.ts",
      "import": "./dist/index.mjs",
      "require": "./dist/index.cjs"
    },
    "./package.json": "./package.json"
  }
}

With exports, anything not listed is blockedrequire("my-pkg/internal/secret") throws. This is intentional; it gives us module encapsulation.

scripts — our project’s commands

Anything in scripts runs via npm run <name> (or yarn <name>, pnpm <name>). A few names are special:

  • start — runs with just npm start
  • test — runs with just npm test
  • pre<x> / post<x> — auto-run before/after script <x>
{
  "scripts": {
    "dev": "node --watch src/index.js",
    "build": "tsc -p tsconfig.json",
    "test": "node --test test/",
    "lint": "eslint src/",
    "prebuild": "rm -rf dist"
  }
}

npm run sets node_modules/.bin on the PATH, so we can invoke locally-installed CLIs like tsc or eslint without a global install.

The three dependency buckets

dependencies
Needed at runtime. Installed when someone installs our package. Example: express, axios.
devDependencies
Needed only during development. Skipped with npm install --production. Example: typescript, eslint, vitest.
peerDependencies
"I work with this — please provide it." Common in plugins (eslint plugins, react libraries). Not auto-installed in npm v7+ for legacy reasons but they are by default in modern npm.
{
  "dependencies": {
    "express": "^4.18.0"
  },
  "devDependencies": {
    "@types/express": "^4.17.0",
    "typescript": "^5.0.0"
  },
  "peerDependencies": {
    "react": ">=18"
  }
}

Version range syntax

  • ^1.2.3 — compatible (≥1.2.3, <2.0.0). The default with npm install.
  • ~1.2.3 — patch-level only (≥1.2.3, <1.3.0).
  • 1.2.3 — exact.
  • * or latest — anything (don’t do this).
  • >=1.2.3 <2.0.0 — explicit range.

engines — declare runtime requirements

Tells installers what Node version we need. Without engine-strict it’s just a warning, but it shows up in errors and documents intent:

{
  "engines": {
    "node": ">=20.0.0",
    "npm": ">=10.0.0"
  }
}

Many CI systems and platforms (Vercel, Render) read this to pick the right Node version.

Other useful fields

  • bin — declare CLI executables. npm install -g my-cli puts these on PATH.
  • files — whitelist of files to publish. Without it, npm uses .npmignore or includes everything.
  • workspaces — array of paths or globs for monorepo sub-packages.
  • private: true — prevents accidental npm publish.
  • sideEffects: false — bundler hint for tree-shaking. Means “imports of this package have no side effects, drop unused exports”.
{
  "bin": {
    "my-cli": "./cli.js"
  },
  "files": ["dist/", "README.md"],
  "private": true,
  "sideEffects": false
}

Generating it

npm init -y makes a minimal one. From there, every npm install <pkg> updates dependencies automatically.


8

npm vs yarn vs pnpm

intermediate npm yarn pnpm package-manager lockfile

All three install packages from the npm registry. They differ in how they install — disk layout, speed, strictness, and the lockfile format. Picking one matters more than people think.

The big idea behind each

  • npm — the original. Ships with Node. Uses a hoisted, flat node_modules. Lockfile: package-lock.json.
  • yarn — Facebook’s reaction to slow npm. Classic v1 uses hoisted layout like npm. Modern Yarn (Berry, v2+) uses Plug’n’Play (PnP) — no node_modules at all. Lockfile: yarn.lock.
  • pnpm — performant npm. Uses a global content-addressable store + hardlinks + a nested-but-symlinked node_modules. Lockfile: pnpm-lock.yaml.

The install layout difference

This is the key bit. Same package.json → very different folders.

npm / yarn classic — hoisted
node_modules/
├── express/
├── lodash/         ← hoisted up
├── debug/          ← hoisted up
└── pg/
    └── node_modules/
        └── pg-types/
Anything in node_modules root is requirable, even if not in package.json (phantom deps).
pnpm — content-addressable store
~/.pnpm-store/    (global, hashed files)
node_modules/
├── express → .pnpm/express@4.18/...
├── pg → .pnpm/pg@8.11/...
└── .pnpm/
    ├── express@4.18.0/node_modules/express/
    └── lodash@4.17.21/node_modules/lodash/
Only direct deps are at root. Strict — no phantom deps. Files are hardlinks to the global store.

Why pnpm is so much faster (and uses less disk)

pnpm keeps one copy of each package version in a global store (~/.pnpm-store). When we install, it creates hardlinks from node_modules to that store. Hardlinks share the same disk blocks — basically zero copy.

So 50 projects all using react@18.2.0 share one copy on disk. With npm, each project has its own full copy. On a dev laptop, this saves tens of GB.

# Install in current project
pnpm install

# See the store size
pnpm store path
# /Users/me/Library/pnpm/store/v3

Strictness — phantom dependencies

This is where pnpm beats npm in code correctness. With npm’s flat layout, our code can require("debug") even if we never listed debug in our package.json — because some transitive dependency installed it and hoisting flattened it to the top.

// works with npm if any dep depends on lodash, even if WE don't:
const _ = require("lodash"); // phantom dep!

The day that transitive package upgrades and drops lodash, our code breaks. pnpm’s symlink-based layout makes this impossible — we can only require what we explicitly declared.

Lockfiles — three formats, same purpose

A lockfile records the exact version of every package (direct and transitive) so we get the same install on every machine, every time.

# npm
package-lock.json     # JSON, very verbose

# yarn
yarn.lock             # custom format, more compact

# pnpm
pnpm-lock.yaml        # YAML

Always commit the lockfile. Without it, CI may install different transitive versions than dev — leading to “works on my machine” bugs.

For CI we use the strict install variants which fail if lockfile and package.json disagree:

npm ci             # strict install, deletes node_modules first
yarn install --immutable
pnpm install --frozen-lockfile

Common commands side by side

action
npm
yarn
pnpm
install all
npm install
yarn
pnpm install
add dep
npm i pkg
yarn add pkg
pnpm add pkg
add dev
npm i -D pkg
yarn add -D pkg
pnpm add -D pkg
remove
npm rm pkg
yarn remove pkg
pnpm rm pkg
run script
npm run x
yarn x
pnpm x
CI strict
npm ci
--immutable
--frozen-lockfile

Workspaces / monorepos

All three support workspaces — multiple packages in one repo.

// package.json at root
{
  "workspaces": ["packages/*", "apps/*"]
}

pnpm uses pnpm-workspace.yaml instead:

packages:
  - 'packages/*'
  - 'apps/*'

For monorepos, pnpm is the most popular choice in 2026 because of strictness and speed. Yarn Berry workspaces are powerful but the PnP layout breaks some tools.

Which one should we use?

  • pnpm — best default for new projects. Fast, disk-efficient, strict. The whole frontend ecosystem (Vue, Vite, Astro) uses it.
  • npm — fine for small projects. Pre-installed everywhere. Zero setup.
  • yarn — still solid for legacy projects on Yarn 1. Yarn Berry’s PnP is interesting but the migration is real work.

Whichever we pick, stick with one per project and commit the lockfile.


Core APIs

9

Buffer

intermediate buffer binary encoding memory

A Buffer is Node’s representation of raw binary data — a fixed-length sequence of bytes. In simple language, it’s like an array of integers from 0 to 255, but stored outside the V8 JavaScript heap so it can be passed cheaply to C code (file system, sockets, crypto).

Buffers came before JavaScript had Uint8Array. Today Buffer is a subclass of Uint8Array — anywhere a typed array works, a Buffer works too.

Why we need it

JavaScript strings are UTF-16 encoded internally. When we read a file or receive a network packet, the data is just bytes — could be UTF-8, binary image data, anything. We need a type that represents raw bytes without a charset assumption. That’s Buffer.

Memory layout
V8 Heap
JS objects, strings, numbers, arrays
Managed by GC. Slow to copy to C.
Buffer memory (off-heap)
Raw bytes. Allocated via libuv.
Zero-copy hand-off to syscalls.

Creating buffers

Three main ways, each with different semantics:

// 1. Allocate N bytes, zero-filled (safe, slightly slower)
const a = Buffer.alloc(10);
console.log(a); // <Buffer 00 00 00 00 00 00 00 00 00 00>

// 2. Allocate N bytes, uninitialized (FAST but may contain old data!)
const b = Buffer.allocUnsafe(10);
// b might contain anything — use only if you immediately overwrite all of it

// 3. From existing data
const c = Buffer.from("hello", "utf8");
console.log(c); // <Buffer 68 65 6c 6c 6f>

const d = Buffer.from([0xde, 0xad, 0xbe, 0xef]);
const e = Buffer.from("SGVsbG8=", "base64"); // → "Hello"

Never use the deprecated new Buffer(n) — it was a security disaster (allocated uninitialized memory by default).

Encodings

When converting between string and bytes, we specify an encoding:

  • utf8 (default) — variable-width, the standard
  • utf16le — UTF-16 little-endian
  • ascii — 7-bit ASCII, top bit dropped
  • latin1 — 1 byte = 1 codepoint, lossy for non-Latin chars
  • base64, base64url — common for transport / URLs
  • hex — pairs of hex digits
  • binary — alias for latin1 (legacy)
const buf = Buffer.from("hello", "utf8");

buf.toString("utf8");    // "hello"
buf.toString("hex");     // "68656c6c6f"
buf.toString("base64");  // "aGVsbG8="

Common operations

const buf = Buffer.from("hello world");

buf.length;              // 11 (bytes, not characters)
buf[0];                  // 104 (the byte for 'h')
buf.slice(0, 5);         // <Buffer 68 65 6c 6c 6f> — shares memory!
buf.subarray(0, 5);      // same; preferred name
buf.includes("world");   // true
buf.indexOf("world");    // 6
buf.equals(Buffer.from("hello world")); // true

// Concat multiple buffers
const merged = Buffer.concat([buf, Buffer.from("!")]);

slice shares memory — careful

buf.subarray() (and the old buf.slice()) returns a view, NOT a copy. Writing to it mutates the original.

const a = Buffer.from("hello");
const b = a.subarray(0, 3);
b[0] = 0x48; // 'H'
console.log(a.toString()); // "Hello"   ← original changed!

If we want a real copy, use Buffer.from(buf).

Reading and writing typed values

Buffers have helpers for parsing binary protocols — reading integers, floats at specific offsets in big or little endian:

const buf = Buffer.alloc(8);
buf.writeUInt32BE(0x12345678, 0);  // write 4 bytes big-endian at offset 0
buf.writeUInt32LE(0xCAFEBABE, 4);  // little-endian at offset 4

buf.readUInt32BE(0).toString(16);  // "12345678"
buf.readUInt32LE(4).toString(16);  // "cafebabe"

This matters when we’re talking to TCP protocols, parsing image headers, or implementing wire formats.

Real-world: hashing a file

const fs = require("node:fs");
const crypto = require("node:crypto");

const hash = crypto.createHash("sha256");
const stream = fs.createReadStream("./big-file.zip");

stream.on("data", (chunk) => {
  // chunk is a Buffer
  hash.update(chunk);
});

stream.on("end", () => {
  console.log(hash.digest("hex"));
});

Notice we never convert chunks to strings — that would corrupt binary data. The whole pipeline is buffer → buffer.

Buffer pool — a perf detail

For small buffers (< 4KB by default), Buffer.allocUnsafe and Buffer.from(string) allocate from a shared pool to avoid the cost of asking libuv for memory each time. That’s why “unsafe” buffers may contain bits of previously freed data. For larger sizes, Node allocates fresh memory directly.

When to use Buffer vs Uint8Array

In new code, Uint8Array works in browsers AND Node. Buffer adds convenience methods (toString, write, indexOf for strings, encoding conversions) but is Node-only. For shared browser/Node code, prefer Uint8Array + TextEncoder/TextDecoder for string conversion.


10

Streams & Backpressure

intermediate streams backpressure pipe pipeline

Streams are how Node handles data we can’t (or don’t want to) hold all in memory at once. A 50GB log file, an HTTP request body, a video upload — we process it in chunks as it flows. In simple language, a stream is an iterable that emits pieces over time.

The four stream types

Readable
We read FROM it. Examples: fs.createReadStream, an HTTP request body, process.stdin.
Writable
We write TO it. Examples: fs.createWriteStream, an HTTP response, process.stdout.
Duplex
Both readable and writable, independent. Example: a TCP socket.
Transform
Duplex where output is computed from input. Examples: zlib.createGzip(), crypto.createCipher().

Reading a file with streams

The classic example. Reading a 10GB file with fs.readFile would blow up our memory. With streams, we process it 64KB at a time:

const fs = require("node:fs");

const stream = fs.createReadStream("./huge.log", { encoding: "utf8" });

stream.on("data", (chunk) => {
  console.log(`got ${chunk.length} bytes`);
});

stream.on("end", () => console.log("done"));
stream.on("error", (err) => console.error(err));

The internal buffer (the highWaterMark, default 64KB for byte streams) fills up, emits 'data', drains, fills again. Memory stays bounded no matter how big the file is.

Piping — connecting streams

Most of the time we don’t want to handle chunks manually. We chain streams with .pipe():

const fs = require("node:fs");
const zlib = require("node:zlib");

// Read → gzip → write — entire pipeline streamed
fs.createReadStream("./access.log")
  .pipe(zlib.createGzip())
  .pipe(fs.createWriteStream("./access.log.gz"));

Three streams, zero buffering of the whole file. Each chunk flows through the chain.

Backpressure — the most important concept

Backpressure is what makes streams safe. In simple language — when the downstream is slow, the upstream needs to pause until the downstream catches up. Otherwise the slow side’s internal buffer grows without bound and we run out of memory.

Backpressure in action
Readable
fast disk
→ chunks →
Transform
gzip (slow!)
Writable
network
If gzip's buffer fills → write() returns false → readable PAUSES until 'drain' event.

When we call writable.write(chunk), it returns a boolean:

  • true — buffer has room, keep writing.
  • false — buffer is full, wait for the 'drain' event before writing more.

pipe() handles all this for us automatically. If we write streams manually, we have to respect that return value.

function pumpManually(readable, writable) {
  readable.on("data", (chunk) => {
    const ok = writable.write(chunk);
    if (!ok) {
      readable.pause(); // STOP reading
      writable.once("drain", () => readable.resume()); // resume when ready
    }
  });
}

pipeline() — the modern, safe way

pipe() has a famous flaw — if any stream in the middle errors out, the others don’t get destroyed and we leak. stream.pipeline() fixes that with proper error and cleanup handling:

const { pipeline } = require("node:stream/promises");
const fs = require("node:fs");
const zlib = require("node:zlib");

async function gzipFile(input, output) {
  await pipeline(
    fs.createReadStream(input),
    zlib.createGzip(),
    fs.createWriteStream(output)
  );
  console.log("done");
}

gzipFile("./access.log", "./access.log.gz").catch(console.error);

Always prefer pipeline over pipe for production code.

Async iteration

Modern Node lets us treat streams as async iterables — much cleaner than event listeners:

const fs = require("node:fs");

async function countLines(path) {
  const stream = fs.createReadStream(path, { encoding: "utf8" });
  let count = 0;

  for await (const chunk of stream) {
    count += (chunk.match(/\n/g) || []).length;
  }

  return count;
}

Object mode

By default streams move Buffers or strings. Set { objectMode: true } and we can pass arbitrary JS objects — useful for record-by-record processing pipelines (CSV rows, JSON lines, DB rows).

const { Transform } = require("node:stream");

const toUpper = new Transform({
  objectMode: true,
  transform(record, _enc, cb) {
    cb(null, { ...record, name: record.name.toUpperCase() });
  },
});

Common real-world uses

  • HTTP serversreq is a Readable, res is a Writable. Streaming a big response means streaming directly from a file or DB.
  • File uploads — pipe req through a parser, straight to S3 or disk.
  • Log processing — read a multi-GB log line by line with readline.
  • Data ETL — read DB rows as a stream, transform, write to another store.

Quick rules

  • Use pipeline() for any non-trivial chain.
  • Respect backpressure if you write streams manually.
  • Don’t JSON.stringify a 1GB object then write it — stream it.
  • For line-by-line text, use readline.createInterface({ input: stream }).

11

File System

beginner nodejs fs io promises

The fs module is how we touch the disk from Node — read files, write files, list directories, watch for changes. It’s one of the first modules everyone uses, and getting the async/sync distinction right is important because Node runs on a single thread.

In simple language: when we read a file synchronously, the entire Node process stops until the file is read. That’s fine for a tiny config at startup, but disastrous inside a request handler — every other request waits.

The three flavors

Node gives us the same operations in three styles. They all do the same thing, just with different async patterns.

fs (callbacks)
Original API. Error-first callback. Old-school.
fs.readFile(path, cb)
fs/promises
Modern. Works with async/await. Use this.
await fs.readFile(path)
fs.*Sync
Blocks the event loop. Only at startup.
fs.readFileSync(path)

Reading and writing — the modern way

We almost always reach for fs/promises. Here’s the pattern we use 90% of the time.

import { readFile, writeFile } from 'node:fs/promises';

// read JSON config
const raw = await readFile('./config.json', 'utf8');
const config = JSON.parse(raw);

// write JSON back
await writeFile('./config.json', JSON.stringify(config, null, 2));

Notice the 'utf8' — without it, readFile returns a Buffer (raw bytes). Forgetting this is the #1 fs gotcha.

When sync is actually okay

There’s exactly one place sync APIs are fine: at startup, before the server is accepting traffic. Loading a config, checking if a directory exists — fine.

import { existsSync, mkdirSync } from 'node:fs';

if (!existsSync('./logs')) {
  mkdirSync('./logs', { recursive: true });
}

Inside a request handler? Never. We block every other in-flight request.

Appending to a log file

A super common real-world pattern. appendFile creates the file if it doesn’t exist.

import { appendFile } from 'node:fs/promises';

async function logEvent(event) {
  const line = `${new Date().toISOString()} ${JSON.stringify(event)}\n`;
  await appendFile('./logs/app.log', line);
}

For high-volume logging we’d use a write stream instead — opening/closing the file on every line is slow.

Watching files

fs.watch notifies us when a file or directory changes. Great for dev tools, config hot-reload, etc. The only catch: it’s a bit unreliable across platforms (macOS uses FSEvents, Linux uses inotify, Windows is its own beast). For production-grade watching, libraries like chokidar smooth out the differences.

import { watch } from 'node:fs';

watch('./config.json', (eventType, filename) => {
  console.log(`${filename} changed (${eventType})`);
  // reload config here
});

Reading large files — use streams

readFile loads the entire file into memory. For a 10GB log file? RIP. Stream it instead.

import { createReadStream } from 'node:fs';
import { createInterface } from 'node:readline';

const rl = createInterface({
  input: createReadStream('./huge.log'),
  crlfDelay: Infinity,
});

for await (const line of rl) {
  if (line.includes('ERROR')) console.log(line);
}

Streams process the file chunk by chunk — constant memory, regardless of file size.

The mental model

Pick fs/promises by default. Use *Sync only for startup config. Reach for streams when files get big. Don’t read inside a hot path if you can cache the result. That covers maybe 95% of all real-world fs usage.


12

Path & URL

beginner nodejs path url esm

Concatenating paths with '/' works on macOS and Linux. It breaks on Windows. That’s why the path module exists — it gives us platform-agnostic path operations so our code runs the same everywhere.

In simple language: never write dir + '/' + file. Always use path.join(dir, file). The module figures out the right separator for the OS we’re running on.

path.join vs path.resolve

These two trip everyone up. The difference matters.

  • path.join — just glues segments together with the OS separator. Relative stays relative.
  • path.resolve — produces an absolute path, walking from right to left until it hits an absolute segment (or falling back to the current working directory).
import path from 'node:path';

path.join('foo', 'bar', 'baz.txt');
// 'foo/bar/baz.txt'  (still relative)

path.resolve('foo', 'bar', 'baz.txt');
// '/Users/manish/proj/foo/bar/baz.txt'  (absolute, from cwd)

path.resolve('/etc', 'config', '../app.conf');
// '/etc/app.conf'  (absolute segment wins, .. collapsed)

Rule of thumb: use resolve when we need an absolute path (passing to fs, comparing paths). Use join for building a relative subpath.

The useful helpers

path.dirname('/var/log/app.log');   // '/var/log'
path.basename('/var/log/app.log');  // 'app.log'
path.extname('/var/log/app.log');   // '.log'
path.parse('/var/log/app.log');
// { root: '/', dir: '/var/log', base: 'app.log', name: 'app', ext: '.log' }

path.parse is great when we need multiple pieces at once.

The ESM __dirname problem

CommonJS had __dirname and __filename baked in as globals. ESM ("type": "module" in package.json) doesn’t. When we switch to ESM, those globals disappear and code breaks.

In simple language: ESM modules don’t know their own location for free anymore — we have to compute it from import.meta.url, which is a file:// URL.

CommonJS
__dirname
__filename
Just works.
ESM
import.meta.url
+ fileURLToPath
Manual conversion.

The workaround:

import { fileURLToPath } from 'node:url';
import path from 'node:path';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

// now use as before
const config = path.join(__dirname, 'config.json');

Why fileURLToPath? Because import.meta.url is a string like file:///Users/manish/app/index.js — we have to convert that URL into a regular filesystem path before passing to fs.

Newer Node (20.11+) actually gives us import.meta.dirname and import.meta.filename directly, which skips the dance. Use them when our Node version allows.

URL parsing

Node uses the WHATWG URL standard (same as browsers). Forget the old url.parse — it’s deprecated.

const u = new URL('https://api.example.com/v1/users?id=42&active=true');

u.hostname;        // 'api.example.com'
u.pathname;        // '/v1/users'
u.searchParams.get('id');     // '42'
u.searchParams.get('active'); // 'true'

searchParams is a URLSearchParams object — iterable, supports append, delete, set. Way nicer than parsing query strings by hand.

Building URLs is just as clean:

const u = new URL('https://api.example.com');
u.pathname = '/v1/users';
u.searchParams.set('id', '42');
u.toString(); // 'https://api.example.com/v1/users?id=42'

The mental model

Use path.join for relative paths, path.resolve for absolute. Convert import.meta.url whenever you need __dirname in ESM. Parse URLs with new URL(), never string-splitting. These three habits cover most real-world cases without ever shipping a Windows-broken bug.


13

Process & Env Vars

beginner nodejs process env dotenv

process is a global object in Node — no import needed. It’s the bridge between our JavaScript code and the operating system: command-line arguments, environment variables, exit codes, signals, the current working directory. Every real Node app uses it constantly.

In simple language: process is “how Node sees the outside world.” Anything that came from the shell that ran us — args, env vars, stdin — lives here.

process.argv — command-line arguments

When we run node script.js --port 3000 --debug, the args show up here. The catch: the first two entries are always the Node binary path and the script path.

console.log(process.argv);
// [ '/usr/bin/node', '/app/script.js', '--port', '3000', '--debug' ]

const args = process.argv.slice(2);
// [ '--port', '3000', '--debug' ]

For anything beyond trivial parsing, reach for the built-in node:util.parseArgs (Node 18.3+) or libraries like commander / yargs.

import { parseArgs } from 'node:util';

const { values } = parseArgs({
  options: {
    port: { type: 'string', default: '3000' },
    debug: { type: 'boolean', default: false },
  },
});
// values.port === '3000', values.debug === false

process.env — environment variables

Every env var is a string. process.env.PORT is "3000", not the number 3000. Convert deliberately.

const port = parseInt(process.env.PORT ?? '3000', 10);
const debug = process.env.DEBUG === 'true';

The ?? handles the unset case — process.env.SOMETHING_UNSET is undefined.

The dotenv pattern

We don’t want to type PORT=3000 DATABASE_URL=... node app.js every time. The convention: keep secrets in a .env file (gitignored) and load it at startup.

# .env
PORT=3000
DATABASE_URL=postgres://localhost/mydb
ANTHROPIC_API_KEY=sk-ant-...

Old-school way — the dotenv package:

import 'dotenv/config';
// now process.env.PORT, process.env.DATABASE_URL etc. are populated

Node 20.6+ ships this natively. No dependency needed:

node --env-file=.env app.js

process.exit and exit codes

process.exit(0) says “success,” anything non-zero is failure. Shell scripts and CI pipelines check these codes.

if (!process.env.DATABASE_URL) {
  console.error('FATAL: DATABASE_URL required');
  process.exit(1);
}

The gotcha: process.exit is abrupt. Pending writes to stdout/stderr can get cut off. For graceful shutdown, set process.exitCode = 1 and let the event loop drain naturally.

process events — graceful shutdown

When Kubernetes sends SIGTERM or we hit Ctrl+C (SIGINT), we should close DB connections, finish in-flight requests, then exit. The pattern:

async function shutdown(signal) {
  console.log(`Received ${signal}, shutting down...`);
  await server.close();
  await db.end();
  process.exit(0);
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

The two events nobody should ignore:

  • uncaughtException — a thrown error nothing caught. State is unknown, log it and exit.
  • unhandledRejection — a promise rejected with no .catch. Same deal.
process.on('uncaughtException', (err) => {
  console.error('Uncaught:', err);
  process.exit(1);
});

process.on('unhandledRejection', (reason) => {
  console.error('Unhandled rejection:', reason);
  process.exit(1);
});

Trying to “recover” from these is almost always wrong — the app is in an undefined state.

The other useful bits

process.cwd();          // current working directory
process.chdir('/tmp');  // change it (rarely needed)
process.pid;            // process ID — useful in logs
process.platform;       // 'darwin' | 'linux' | 'win32'
process.version;        // 'v20.10.0' — Node version
process.uptime();       // seconds since process started
process.memoryUsage();  // { rss, heapTotal, heapUsed, ... }

process.memoryUsage() is gold for debugging memory leaks. Log heapUsed periodically and watch the trend.

The mental model

process is the OS-facing side of Node. Parse argv for CLI args, read env for config (always as strings), exit cleanly with proper codes, and always handle SIGTERM if you ship anything to production — otherwise rolling deploys will drop in-flight requests.


Async Patterns

14

Callbacks, Promises & async/await in Node

intermediate nodejs async promises esm

Node started before promises existed in JavaScript, so its original async style was callbacks — and not just any callbacks, a specific convention called error-first callbacks. Everything we do today (promises, async/await) is layered on top of that foundation. Understanding the progression helps when we debug legacy code or interop with older modules.

Error-first callbacks — the original

The convention: every async function takes a callback whose first argument is an error (or null on success), and subsequent arguments are the actual results.

import fs from 'node:fs';

fs.readFile('./config.json', 'utf8', (err, data) => {
  if (err) {
    console.error('Read failed:', err);
    return;
  }
  console.log('Got:', data);
});

The “first param is the error” convention sounds simple, but in a real app with five nested async calls we end up with callback hell — pyramids of indentation, error handling repeated everywhere, no way to use try/catch.

fs.readFile('config.json', 'utf8', (err, data) => {
  if (err) return cb(err);
  fs.readFile(JSON.parse(data).next, 'utf8', (err, data2) => {
    if (err) return cb(err);
    fs.writeFile('out.txt', data2, (err) => {
      if (err) return cb(err);
      // ...
    });
  });
});

Promises — chainable, composable

Promises wrap a future value. We attach .then for success, .catch for failure. The chain flattens the pyramid.

import { readFile, writeFile } from 'node:fs/promises';

readFile('config.json', 'utf8')
  .then((data) => readFile(JSON.parse(data).next, 'utf8'))
  .then((data2) => writeFile('out.txt', data2))
  .catch((err) => console.error('Failed:', err));

Better, but still verbose. The real win comes next.

async/await — promises in disguise

async/await is syntactic sugar over promises. An async function always returns a promise. await pauses inside that function until the awaited promise resolves. We get to write async code that reads like sync code.

async function transform() {
  try {
    const data = await readFile('config.json', 'utf8');
    const data2 = await readFile(JSON.parse(data).next, 'utf8');
    await writeFile('out.txt', data2);
  } catch (err) {
    console.error('Failed:', err);
  }
}

In simple language: await is a “wait for this, then continue on the next line” marker. The function returns control to the event loop while waiting — Node isn’t blocked.

Callbacks
Original Node style. Error-first. Hard to compose.
Promises
Chainable. then/catch. Composable.
async/await
Reads like sync. try/catch works. Default in 2025.

Sequential vs parallel — the await trap

await runs things one at a time. If three operations don’t depend on each other, that’s wasteful.

// SLOW — 3 sequential round-trips
const user = await fetchUser(id);
const orders = await fetchOrders(id);
const cart = await fetchCart(id);

// FAST — all 3 in parallel, wait for the slowest
const [user, orders, cart] = await Promise.all([
  fetchUser(id),
  fetchOrders(id),
  fetchCart(id),
]);

This is one of the most common perf wins in Node code. Look for “await, await, await” with no data dependency and combine with Promise.all.

Promisification — bridging old code

Some old modules still use error-first callbacks. We don’t want to write .then chains around them. Wrap them with util.promisify.

import { promisify } from 'node:util';
import { exec as execCb } from 'node:child_process';

const exec = promisify(execCb);

const { stdout } = await exec('git rev-parse HEAD');
console.log('Commit:', stdout.trim());

fs.promises is essentially fs callbacks promisified at the source — same API, promise-based.

Top-level await — only in ESM

Old Node modules (CJS) couldn’t await at the top of a file — only inside an async function. ESM modules can. This is huge for startup code.

// app.js — package.json has "type": "module"
import { readFile } from 'node:fs/promises';

const config = JSON.parse(await readFile('./config.json', 'utf8'));
const db = await connectDB(config.dbUrl);

export { db };

No more (async () => { ... })() IIFE wrappers around our entry point. Just write the code.

The catch: top-level await makes a module’s evaluation async. If something imports this module, its import statement effectively waits for us. Usually fine, occasionally surprising.

The mental model

Use async/await by default. Use Promise.all for independent parallel work. Wrap legacy callback APIs with promisify. In ESM, lean on top-level await for startup. Callbacks aren’t dead — many event APIs (EventEmitter, streams) still use them — but for one-shot async results, promises and await won.


15

util.promisify

intermediate nodejs util promises async

A huge chunk of Node’s core was designed before promises existed in JavaScript. Lots of APIs — fs, child_process, dns, plenty of npm packages — still take an error-first callback as their last argument. We don’t want to keep nesting callbacks in 2025. util.promisify is the official adapter that converts any such function into one that returns a promise.

In simple language: it takes a function that wants a callback and gives back a function that returns a promise. Zero ceremony, works on almost anything.

The convention it relies on

promisify assumes the function follows the error-first callback rule:

  • callback is the last argument
  • callback’s signature is (err, value) => ...

If both are true, promisify works automatically. Here’s the manual version of what it does, just to demystify it:

function manualPromisify(fn) {
  return function (...args) {
    return new Promise((resolve, reject) => {
      fn(...args, (err, result) => {
        if (err) reject(err);
        else resolve(result);
      });
    });
  };
}

That’s basically it. The real util.promisify is more robust (handles this, multi-arg callbacks, special-cased core funcs) but the spirit is identical.

Using it

import { promisify } from 'node:util';
import { exec } from 'node:child_process';
import dns from 'node:dns';

const execAsync = promisify(exec);
const lookup = promisify(dns.lookup);

const { stdout } = await execAsync('git rev-parse HEAD');
console.log('HEAD:', stdout.trim());

const { address } = await lookup('nodejs.org');
console.log('IP:', address);

We now await what used to need a callback. Errors flow through normal try/catch.

What fs.promises actually is

fs/promises is what we’d get if we sat down and promisified every function in fs. The Node team did that work for us and shipped it as a separate module.

fs.readFile(path, cb) — promisify → fs.promises.readFile(path)
Same logic, same options. The callback became a returned promise.

Proof — we could literally rebuild it:

import fs from 'node:fs';
import { promisify } from 'node:util';

const readFile = promisify(fs.readFile);
const writeFile = promisify(fs.writeFile);

// these now behave just like fs.promises.readFile / writeFile
const data = await readFile('./config.json', 'utf8');

fs.promises is just nicer ergonomics with a single import.

Custom promisify behavior

Some core APIs don’t follow the strict (err, value) shape — for example dns.lookup calls back with (err, address, family), two result args. Node special-cases these via the util.promisify.custom symbol. The promisified version returns { address, family } instead of just address.

import { promisify } from 'node:util';
import dns from 'node:dns';

const lookup = promisify(dns.lookup);
const result = await lookup('nodejs.org');
// { address: '104.20.22.46', family: 4 }

We don’t normally need to set [util.promisify.custom] ourselves, but if we ship a library with non-standard callback shapes, that’s how we’d do it.

When NOT to use promisify

If the function has any of these traits, promisify is wrong:

  • Emits events repeatedly (e.g., a stream emitting data multiple times). Promises resolve once. Use for await...of, pipeline, or stay on events.
  • The callback isn’t error-first (e.g., setTimeout(cb, ms) — its callback has no err). You can still wrap it manually, just not with promisify.
  • Already returns a promise. No-op at best, weird wrapping at worst.

Here’s the right way to “promisify” setTimeout — Node ships a promise version already:

import { setTimeout as sleep } from 'node:timers/promises';

await sleep(1000); // pauses for 1s

The mental model

util.promisify is the bridge between Node’s callback past and its promise present. We use it directly when we hit an old API that hasn’t been modernized, and we use the already-promisified versions (fs/promises, timers/promises, dns/promises, stream/promises) whenever they exist — they’re idiomatic and well-tested.


16

EventEmitter

intermediate nodejs events pubsub

EventEmitter is the publish-subscribe primitive that sits underneath an enormous fraction of Node’s core: every stream is an emitter, every HTTP server and request is an emitter, the process global is one too. If we want to understand what req.on('data', ...) really does, we have to understand EventEmitter.

In simple language: it’s an object with two main methods — emit('name', data) to fire an event, and on('name', handler) to subscribe to it. That’s the whole concept. Everything else is variation.

The basic pattern

import { EventEmitter } from 'node:events';

const bus = new EventEmitter();

bus.on('user.signup', (user) => {
  console.log(`Welcome email queued for ${user.email}`);
});

bus.on('user.signup', (user) => {
  console.log(`Analytics tracked for ${user.id}`);
});

bus.emit('user.signup', { id: 42, email: 'a@b.com' });

Multiple listeners on the same event? They all run, in registration order, synchronously when emit is called. The emitter doesn’t await anything — if a listener is async, it runs but emit doesn’t wait for it.

Publisher
emit('x', data)
→ → →
EventEmitter
listener map
→ → →
Listener A
Listener B
Listener C

once — fire-and-forget subscriber

If we only care about the first occurrence, once auto-removes the listener after it fires.

server.once('listening', () => {
  console.log('Server started');
});

Great for one-time initialization signals.

off / removeListener — cleanup

If we add listeners dynamically, we have to remove them or we leak memory. off (alias for removeListener) needs the same function reference we passed to on.

function onData(chunk) { /* ... */ }

stream.on('data', onData);
// later
stream.off('data', onData);

Anonymous arrow functions can’t be removed cleanly because we don’t have a reference. That’s why long-lived emitters always store handler references.

The MaxListeners warning

Every emitter has a soft limit — by default 10 listeners per event. Cross that and Node prints:

(node:1234) MaxListenersExceededWarning: Possible EventEmitter memory leak detected.
11 data listeners added to [ReadStream].

In simple language: Node is saying “you keep adding listeners and never removing them — looks like a leak.” Sometimes it’s a real bug (forgot to off), sometimes it’s just a high-traffic legitimate use. We can raise the cap:

emitter.setMaxListeners(50);          // per-instance
EventEmitter.defaultMaxListeners = 20; // global default

But always check whether the listeners should actually be removed before papering over the warning.

The special ‘error’ event

EventEmitter has one cursed event name: 'error'. If we emit('error', err) and nothing is listening, Node treats it as uncaught and crashes the process.

const e = new EventEmitter();
e.emit('error', new Error('boom')); // CRASHES

The fix is always to have an error listener:

e.on('error', (err) => {
  console.error('Emitter error:', err);
});

This is why streams everywhere need .on('error', ...) — it’s the same EventEmitter behavior.

Extending it for our own classes

The natural way to build a class with built-in pub/sub:

import { EventEmitter } from 'node:events';

class JobRunner extends EventEmitter {
  async run(job) {
    this.emit('start', job);
    try {
      const result = await job.execute();
      this.emit('done', { job, result });
    } catch (err) {
      this.emit('error', err);
    }
  }
}

const runner = new JobRunner();
runner.on('done', ({ job, result }) => log(`${job.id} → ${result}`));
runner.on('error', (err) => alert(err));

This pattern shows up everywhere — Express’s app, Mongoose connection, ws WebSocket server, the Node process itself.

events.once — promise wrapper

When we want to await for a single event (e.g., wait for 'listening'), there’s a helper:

import { once } from 'node:events';

await once(server, 'listening');
console.log('Server is up');

Resolves with an array of args. Rejects if 'error' fires first. Beautiful for sequencing.

The mental model

EventEmitter is sync pub/sub: emit is just “loop through listeners and call them in order.” Always handle 'error'. Always remove listeners on long-lived emitters. When you npm install something and it has an .on(...) API, you’re almost certainly looking at an EventEmitter underneath.


HTTP & Networking

17

http module

intermediate nodejs http server networking

Express, Fastify, Koa — they’re all wrappers around this. Node ships with everything needed to build an HTTP server and client out of the box. Understanding the raw http module is what separates “I use a framework” from “I know what my framework actually does.”

In simple language: http.createServer gives us a callback (req, res) => {...} that fires for every incoming request. req is a readable stream (the request), res is a writable stream (the response we send back). That’s the whole API.

A minimal server

import http from 'node:http';

const server = http.createServer((req, res) => {
  res.writeHead(200, { 'Content-Type': 'application/json' });
  res.end(JSON.stringify({ hello: 'world' }));
});

server.listen(3000, () => {
  console.log('Listening on http://localhost:3000');
});

No dependencies. No framework. Real production HTTP server.

The req / res lifecycle

Client
sends request
req (IncomingMessage)
method, url, headers, body stream
handler runs
res.writeHead → res.write → res.end
Client receives

IncomingMessage — the request

req is a readable stream. The body doesn’t arrive in one chunk — we have to assemble it.

function readBody(req) {
  return new Promise((resolve, reject) => {
    const chunks = [];
    req.on('data', (chunk) => chunks.push(chunk));
    req.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')));
    req.on('error', reject);
  });
}

const server = http.createServer(async (req, res) => {
  if (req.method === 'POST' && req.url === '/echo') {
    const body = await readBody(req);
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end(body);
  } else {
    res.writeHead(404);
    res.end();
  }
});

This is exactly what Express’s body-parser does for us, just hidden behind req.body.

ServerResponse — sending back

Three layers of writing:

  • res.writeHead(statusCode, headers) — sends the status line + headers. Call once.
  • res.write(chunk) — sends a body chunk. Call zero or more times (streaming).
  • res.end([chunk]) — finishes the response. Required, else the client hangs forever.

We can stream a big response without buffering:

import fs from 'node:fs';

http.createServer((req, res) => {
  res.writeHead(200, { 'Content-Type': 'video/mp4' });
  fs.createReadStream('./big.mp4').pipe(res);
}).listen(3000);

pipe connects the file stream to the response stream — chunks flow through, memory stays flat.

Raw http vs Express — what does Express add?

Almost everything in Express is sugar over what we just wrote:

What Express addsUnderlying http
Routing (app.get('/users/:id', ...))Manual req.url + req.method checks
req.body, req.params, req.queryManual stream reading and URL parsing
Middleware chainOne handler function
res.json(), res.send()writeHead + end
Error handling middlewareTry/catch + sending error responses

Express isn’t magic — it’s a thoughtful set of patterns on top of http.createServer. Knowing this means we can drop down to raw http for performance-critical endpoints, or build our own framework in a weekend.

http.request — the client side

Same module, opposite direction. We can make outgoing HTTP requests too.

import http from 'node:http';

const req = http.request({
  hostname: 'api.example.com',
  path: '/users/42',
  method: 'GET',
}, (res) => {
  const chunks = [];
  res.on('data', (c) => chunks.push(c));
  res.on('end', () => {
    console.log('Got:', Buffer.concat(chunks).toString('utf8'));
  });
});

req.on('error', console.error);
req.end(); // sends the request

In practice, we use the built-in fetch (Node 18+) for this — it’s promise-based and matches the browser API. But http.request is what powers libraries like axios and is still the most efficient option for streaming or fine-grained control over keep-alive and agents.

The keep-alive gotcha

By default, Node creates a new TCP connection for every outgoing request. For high-volume calls (one service calling another thousands of times a minute), this is brutal. We use an Agent with keepAlive: true to reuse connections:

import { Agent } from 'node:http';

const agent = new Agent({ keepAlive: true, maxSockets: 50 });
// pass agent into http.request options

Modern fetch and clients like undici do this automatically.

The mental model

http.createServer takes (req, res). req is a stream we read. res is a stream we write. Everything else — routing, middleware, JSON parsing — is a pattern built on top. Once that clicks, no Node HTTP code is mysterious anymore.


18

HTTPS & TLS

intermediate nodejs https tls security

https is http plus TLS — same API, same req/res shape, but the bytes on the wire are encrypted. In real production we almost never expose Node’s HTTPS server directly; a reverse proxy (Caddy, Nginx, ALB) handles TLS and forwards plain HTTP to our app. But knowing the raw module matters when we build internal mTLS services, talk to a third-party API with a custom cert, or troubleshoot why our fetch says “self-signed certificate.”

In simple language: TLS is the encryption layer between TCP and HTTP. We give Node a private key + certificate, it does the handshake with clients, and our handler code sees a normal request.

A minimal HTTPS server

import https from 'node:https';
import { readFileSync } from 'node:fs';

const server = https.createServer({
  key: readFileSync('./certs/server.key'),
  cert: readFileSync('./certs/server.crt'),
}, (req, res) => {
  res.writeHead(200);
  res.end('Hello over TLS');
});

server.listen(8443);

Same (req, res) handler as plain HTTP. The only difference is the options object with key and cert.

Where the cert comes from

For local dev we generate a self-signed cert with mkcert (handles the trust store dance):

mkcert -install
mkcert localhost 127.0.0.1
# produces localhost.pem and localhost-key.pem

For production we get certs from Let’s Encrypt (via certbot or Caddy), or from a cloud-managed cert service. Don’t ship self-signed certs to production — clients will refuse the connection unless explicitly told to ignore.

The TLS handshake — what’s happening

Client Hello (supported ciphers)
Server Hello + Certificate
verify cert against trusted CA
Key exchange
Key exchange
✓ Encrypted channel established
HTTP traffic flows

The client validates that our cert is signed by a CA it trusts and that the hostname matches. That’s where most TLS pain comes from.

Mutual TLS (mTLS) — the client proves who it is too

In normal HTTPS, only the server presents a cert. In mutual TLS, the client also presents a cert, and the server validates it. This is how zero-trust internal services authenticate without API keys — Kubernetes service meshes, AWS IAM Roles Anywhere, Stripe’s payment terminal API.

import https from 'node:https';
import { readFileSync } from 'node:fs';

const server = https.createServer({
  key: readFileSync('./certs/server.key'),
  cert: readFileSync('./certs/server.crt'),
  ca: readFileSync('./certs/client-ca.crt'),  // CA we trust to sign client certs
  requestCert: true,    // ask client for a cert
  rejectUnauthorized: true, // close connection if client cert is invalid
}, (req, res) => {
  const cert = req.socket.getPeerCertificate();
  res.end(`Hello, ${cert.subject.CN}`);
});

server.listen(8443);

The client must present a cert signed by client-ca.crt. The server then knows exactly who’s calling.

Calling an mTLS server as a client

Same idea, other direction:

import https from 'node:https';

const req = https.request({
  hostname: 'internal-api.local',
  port: 8443,
  path: '/data',
  method: 'GET',
  key: readFileSync('./certs/client.key'),
  cert: readFileSync('./certs/client.crt'),
  ca: readFileSync('./certs/server-ca.crt'), // CA that signed the server's cert
}, (res) => {
  res.pipe(process.stdout);
});
req.end();

Common pitfalls

UNABLE_TO_VERIFY_LEAF_SIGNATURE — the server’s cert chain is incomplete. The fix is on the server side: include intermediate certs in the chain, not just the leaf. We can also point Node at extra CAs:

NODE_EXTRA_CA_CERTS=/path/to/corporate-root.pem node app.js

SELF_SIGNED_CERT_IN_CHAIN in dev — we’re calling our own self-signed server. We tell our HTTP client to trust it via ca: option. Do not set NODE_TLS_REJECT_UNAUTHORIZED=0 in production — it disables all cert checking globally and is a giant security hole.

Hostname mismatch — the cert is for api.example.com but we’re connecting to 1.2.3.4. TLS verifies the hostname. Either use the domain name, or configure the cert with a Subject Alternative Name for the IP.

Cert expiry — Let’s Encrypt certs last 90 days. If we forget to renew, the entire service goes down. Use auto-renewal (Caddy does this for free) and monitor expiry.

When to terminate TLS in Node vs at a proxy

Honestly, most of the time we put a reverse proxy in front of Node — it handles TLS, our app speaks plain HTTP internally. The proxy does cert renewal, HTTP/2, compression, often better than Node would. We reach for Node’s HTTPS when:

  • We need mTLS at the application layer (auth tied to cert).
  • We’re building a CLI tool or background worker that talks to a TLS-protected internal service.
  • We’re writing a webhook receiver for a service that requires TLS to a specific hostname we own.

The mental model

https is http plus a { key, cert } options bag. Cert + private key go on the server. Trusted CAs go on whoever’s verifying. mTLS just means both sides present a cert. When something breaks, 90% of the time it’s a hostname mismatch, missing intermediate cert, or expired cert — not Node’s fault.


19

net & TCP

advanced nodejs net tcp sockets

HTTP is a protocol that runs on top of TCP. TCP is the actual transport layer — a stream of bytes between two machines, with delivery guarantees and ordering, but no concept of “requests” or “responses.” Node’s net module gives us direct access. We rarely need it, but when we do, nothing else will work.

In simple language: net is what http uses underneath. If we strip HTTP away, we’re just reading and writing bytes on a socket. That’s a TCP connection.

A minimal TCP server

import net from 'node:net';

const server = net.createServer((socket) => {
  console.log('Client connected:', socket.remoteAddress);

  socket.write('Welcome to the echo server\n');

  socket.on('data', (chunk) => {
    socket.write(`echo: ${chunk}`);
  });

  socket.on('end', () => {
    console.log('Client disconnected');
  });
});

server.listen(4000, () => console.log('TCP server on :4000'));

Test it from another terminal:

nc localhost 4000
# Welcome to the echo server
> hello
# echo: hello

That’s it. No paths, no methods, no headers. Just bytes in, bytes out.

A TCP client

import net from 'node:net';

const client = net.createConnection({ host: 'localhost', port: 4000 }, () => {
  console.log('Connected');
  client.write('ping\n');
});

client.on('data', (chunk) => {
  console.log('Got:', chunk.toString());
  client.end();
});

The socket is a duplex stream

A socket in Node is both readable ('data' events, for await ... of) and writable (write, end). It’s a duplex stream. Everything we know about streams applies.

Client
write →
TCP
← write
Server socket
Two duplex streams glued together by a TCP connection. No request boundaries.

The framing problem — why HTTP exists

Here’s the catch with raw TCP: there are no message boundaries. If a client calls socket.write('hello') then socket.write('world'), the server might see 'helloworld', 'hel' then 'loworld', or 'helloworld' all at once. TCP coalesces and splits at will.

In simple language: TCP is a pipe, not a stack of envelopes. We need to invent our own framing — like ending every message with \n, or prefixing each message with its length.

// length-prefixed framing
function send(socket, payload) {
  const buf = Buffer.from(payload);
  const len = Buffer.alloc(4);
  len.writeUInt32BE(buf.length, 0);
  socket.write(len);
  socket.write(buf);
}

This is exactly the problem HTTP, MQTT, Redis’s RESP, and PostgreSQL’s wire protocol all solve in their own way. Frameworks like HTTP give us message boundaries for free.

When to use net vs http

Use net only when:

  • You’re implementing a non-HTTP protocol. Custom binary protocols, game servers, IoT devices that speak Modbus / proprietary protocols, Postgres/Redis-style protocols.
  • You’re building a proxy or load balancer and need to forward raw bytes.
  • You need lowest possible overhead. No HTTP parsing, no headers. Real-time financial systems, telemetry pipelines.
  • You’re tunneling something through SSH or a VPN socket.

Use http (or HTTP frameworks) when:

  • You’re building anything that looks like a web service.
  • You want to reuse browser tooling (curl, Postman, fetch).
  • You want middleware, routing, JSON parsing — basically free.

For 99% of backend work, http is the right answer. net is for the 1% that’s genuinely lower-level.

Unix domain sockets

net can also do IPC over a filesystem path, no TCP involved. Way faster than localhost TCP when two processes on the same machine talk:

const server = net.createServer(handler).listen('/tmp/myapp.sock');
const client = net.createConnection({ path: '/tmp/myapp.sock' });

PostgreSQL, Docker daemon, and many cloud sidecars use this pattern.

Backpressure — same rules as streams

socket.write returns false when the kernel’s send buffer is full. If we ignore that and keep writing, memory balloons. Either await a 'drain' event, or use pipeline to glue streams together — it handles backpressure for us.

import { pipeline } from 'node:stream/promises';

await pipeline(source, socket); // backpressure-safe

The mental model

net gives us a byte pipe between two endpoints. No requests, no responses, no framing. We invent the protocol on top. It’s almost always the wrong choice for web work, and the only sensible choice for custom binary protocols. Knowing it exists — and that http is just bytes-with-rules layered on top — makes the whole networking stack much less mysterious.

References


Concurrency & Scaling

20

Worker Threads

advanced nodejs worker-threads parallelism performance

Node is single-threaded for JavaScript execution. The event loop, our handlers, every line of our code — all on one thread. That’s fine for I/O-bound work (the kernel does the waiting). It’s a disaster for CPU-bound work: a sync 2-second computation blocks every other in-flight request for 2 seconds. Worker Threads are Node’s answer.

In simple language: Worker Threads let us spawn a separate JS thread that runs alongside the main one. Real parallel execution, not just async I/O. We communicate via message passing, like a tiny isolated worker microservice that lives in our process.

What “CPU-bound” actually means

A request is CPU-bound when our process is doing math, not waiting on the network/disk. Examples:

  • Parsing a 50MB JSON or CSV
  • Resizing an image
  • Computing a SHA-256 hash over a big buffer
  • Compiling a regex against millions of strings
  • Running ML inference in pure JS

For I/O work (DB query, HTTP fetch, file read), Workers won’t help — Node’s event loop is already great at that.

A minimal worker

Workers live in their own file (or string). We message back and forth.

// main.js
import { Worker } from 'node:worker_threads';

function runHeavy(input) {
  return new Promise((resolve, reject) => {
    const worker = new Worker('./worker.js', { workerData: input });
    worker.on('message', resolve);
    worker.on('error', reject);
    worker.on('exit', (code) => {
      if (code !== 0) reject(new Error(`Worker exited ${code}`));
    });
  });
}

console.log(await runHeavy({ size: 10_000_000 }));
// worker.js
import { workerData, parentPort } from 'node:worker_threads';

// some CPU-heavy task — does NOT block main.js
let sum = 0;
for (let i = 0; i < workerData.size; i++) sum += Math.sqrt(i);

parentPort.postMessage({ sum });

While worker.js is grinding, the main thread keeps serving HTTP requests. That’s the point.

The architecture

Main thread
Event loop, HTTP, fast logic
postMessage →
← on('message')
Worker thread
Own V8 isolate, own event loop, own memory
CPU-heavy work
parentPort.postMessage
Separate memory. Communication via structured-clone message passing.

Each worker is essentially a fresh Node instance running inside the same process. Separate V8 heap, separate event loop, separate require cache.

postMessage — the message channel

postMessage uses the structured clone algorithm to serialize the data — same one browsers use for postMessage between windows. It can move plain objects, Buffers, Maps, Sets, typed arrays, even circular references. It cannot move functions, class instances with methods, or DOM-like objects.

parentPort.postMessage({
  result: bigBuffer,
  meta: { ts: Date.now() },
});

Bigger payload = more cloning cost. If we’re sending megabytes, consider transferList — Node moves the Buffer/ArrayBuffer without copying (the sender loses access to it).

parentPort.postMessage({ buf }, [buf.buffer]); // ownership transfer

SharedArrayBuffer — shared memory between threads

For the rare cases where workers need to read/write the same memory (image processing pipelines, multi-worker numerical compute), SharedArrayBuffer is the escape hatch.

// main.js
const sab = new SharedArrayBuffer(1024);
const view = new Int32Array(sab);
worker.postMessage(sab); // both threads now see the same bytes

Multiple threads writing the same memory is exactly the classic concurrency hazard — race conditions, torn reads, the works. Atomics (built-in) gives us atomic read/write/compare-and-swap. Use sparingly and only when message passing is genuinely too slow.

When to use Workers — and when not

Reach for Workers when:

  • The CPU work takes more than ~50ms — long enough to noticeably block the event loop.
  • The work is parallelizable and we want to use multiple cores.
  • We need real isolation (a sandbox for user-supplied code, for example).

Don’t reach for Workers when:

  • The work is I/O. Async I/O is already free of the event loop.
  • The work is tiny. Spawning a worker has startup cost (~10–50ms). For small jobs the overhead dwarfs the gain.
  • We just want more concurrency for HTTP requests. Use cluster (multiple Node processes behind the OS load balancer), or run multiple containers behind a reverse proxy. That’s the idiomatic Node scaling story.

The worker pool pattern

We almost never spawn a worker per request — startup cost kills us. Instead, we keep a pool of N workers (often os.availableParallelism()), and queue jobs to them. Libraries like piscina do this for us with a pool.run(task) API.

import Piscina from 'piscina';

const pool = new Piscina({ filename: new URL('./worker.js', import.meta.url) });

const result = await pool.run({ image: buf });

Pool stays warm, requests share workers, throughput goes way up.

Workers vs cluster vs child_process — quick contrast

  • Workers — same process, separate threads, message passing, shared memory possible. CPU-bound JS work.
  • cluster — multiple Node processes, OS-level load balancing on the same port. Scaling I/O-bound HTTP servers across cores.
  • child_process — spawning external commands (ffmpeg, git) or running other Node scripts as totally separate processes. Highest isolation, highest overhead.

Pick by what we’re trying to do — they’re not interchangeable.

The mental model

Workers turn Node from single-threaded to multi-threaded for CPU work. The cost is message passing between isolated heaps; the win is unblocking the main event loop. Use a pool, not one-off spawns. And remember: most Node bottlenecks are I/O, not CPU — measure before reaching for this hammer.


21

Cluster Module

advanced nodejs cluster scaling performance

Node.js runs JavaScript on a single thread. So if our server has 8 CPU cores, a plain Node process uses… 1. The other 7 sit idle. That’s wasteful for an HTTP server.

The cluster module fixes this by forking N copies of our process (one per core). All workers share the same port — the OS or the master process load-balances incoming connections across them.

In simple language: cluster is “run my server 8 times in parallel, and let them split traffic.”

Why not just spawn 8 servers manually?

We could run 8 Node processes on ports 3001-3008 and put nginx in front. That works. But cluster is simpler — one entry file, one port, automatic distribution. And workers can talk to the master via IPC if needed.

MASTER (PID 1000)
listens on :3000, forks workers
Worker
PID 1001
CPU 0
Worker
PID 1002
CPU 1
Worker
PID 1003
CPU 2
Worker
PID 1004
CPU 3
All 4 workers accept() on the SAME port :3000

How

The classic pattern: master forks, workers serve.

import cluster from 'node:cluster';
import os from 'node:os';
import http from 'node:http';

const numCPUs = os.cpus().length;

if (cluster.isPrimary) {
  console.log(`Master ${process.pid} forking ${numCPUs} workers`);
  for (let i = 0; i < numCPUs; i++) cluster.fork();

  cluster.on('exit', (worker, code) => {
    console.log(`Worker ${worker.process.pid} died (${code}), respawning`);
    cluster.fork();
  });
} else {
  http.createServer((req, res) => {
    res.end(`Handled by worker ${process.pid}\n`);
  }).listen(3000);
}

Hit :3000 repeatedly and you’ll see different PIDs in the response. That’s the OS round-robining.

Cluster vs Worker Threads — totally different things

People confuse these constantly. They’re not the same.

Aspect Cluster Worker Threads
UnitSeparate OS processThread inside one process
MemoryEach worker has own V8 heapCan share memory via SharedArrayBuffer
Use caseScale HTTP servers across coresOffload CPU-heavy work (image resize, hashing)
Startup costHeavy (full process)Lighter
CommsIPC messagespostMessage + shared buffers

Rule of thumb: cluster = horizontal scaling for I/O-bound web servers. Worker threads = offload one CPU-heavy task without blocking the event loop.

Gotchas

  • State doesn’t replicate. Each worker has its own memory. In-memory caches, rate limiters, WebSocket connections — none are shared. Use Redis.
  • Sticky sessions. If we use WebSockets or session affinity, round-robin breaks. Need a layer-7 LB like nginx with ip_hash.
  • PM2 does this for us. In production, most people use PM2’s cluster mode instead of writing the fork code by hand. Same idea, less boilerplate.
  • Don’t fork more than os.cpus().length. More workers = more context switching, not more throughput.

When NOT to use cluster

If we’re behind Kubernetes or run multiple Docker containers anyway — skip cluster. One Node process per container, scale by adding containers. Simpler ops story.


22

Child Process

intermediate nodejs child_process shell ipc

Sometimes we need Node to do something Node can’t do directly — run ffmpeg, call git, execute a Python script, shell out to imagemagick. That’s child_process.

It gives us four ways to spawn an external process: spawn, exec, execFile, and fork. They all start a subprocess. The only difference is how output is delivered and what the child is.

spawn vs exec vs fork — pick the right one

Method Output Use when
spawnStreamed (stdout/stderr are streams)Long-running, big output, want to pipe
execBuffered into one string (callback)Quick command, small output (< 1MB)
execFileBuffered, no shellexec but safer (no shell injection)
forkIPC channelSpawning another Node.js script

spawn — the workhorse

spawn returns a child process with stdout/stderr as readable streams. Use this for anything that produces a lot of output or runs a while.

import { spawn } from 'node:child_process';

const ffmpeg = spawn('ffmpeg', ['-i', 'input.mp4', '-c:v', 'libx264', 'out.mp4']);

ffmpeg.stdout.on('data', (chunk) => {
  console.log('stdout:', chunk.toString());
});

ffmpeg.stderr.on('data', (chunk) => {
  // ffmpeg writes progress to stderr, weirdly
  process.stderr.write(chunk);
});

ffmpeg.on('close', (code) => {
  if (code === 0) console.log('done');
  else console.error(`ffmpeg exited with code ${code}`);
});

Because output is streamed, memory stays flat even if ffmpeg runs for an hour and prints megabytes.

exec — convenient but dangerous

exec runs the command through a shell (/bin/sh -c ...) and buffers all output into one string. Easy for one-liners.

import { exec } from 'node:child_process';

exec('git log --oneline -5', (err, stdout, stderr) => {
  if (err) return console.error(err);
  console.log(stdout);
});

The shell convenience comes with a catch: shell injection. Never do this:

// BAD — user controls filename, can inject `; rm -rf /`
exec(`cat ${userInput}`, callback);

Use execFile or spawn with an args array — no shell involved, no injection.

import { execFile } from 'node:child_process';

// Safe. userInput is an argv element, not interpreted by shell.
execFile('cat', [userInput], callback);

Also: exec has a default maxBuffer of 1MB. If the command prints more, it errors. Bump it or switch to spawn.

fork — Node-to-Node with IPC

fork is a special case of spawn for launching another Node script. It sets up an IPC channel so parent and child can send() messages to each other.

// parent.js
import { fork } from 'node:child_process';

const worker = fork('./worker.js');
worker.send({ task: 'resize', file: 'photo.jpg' });
worker.on('message', (msg) => {
  console.log('worker said:', msg);
});
// worker.js
process.on('message', async (msg) => {
  // do heavy work
  const result = await processImage(msg.file);
  process.send({ done: true, result });
});

Use fork when we want to offload CPU work to another process without the complexity of cluster. (Worker threads are usually a better fit for pure-CPU work — fork shines when the child needs its own memory space, e.g. running untrusted code or a separate Node version.)

Production checklist

  • Always handle error AND close events. A spawn error (binary not found) fires error, not close.
  • Sanitize args. If user input gets into a child process command, use execFile/spawn with an args array, never string concatenation into a shell.
  • Set timeouts. Hung children leak. Use the timeout option or kill them manually with child.kill('SIGTERM').
  • Pipe stdio carefully. By default child stdio is pipe. For fire-and-forget background jobs, use stdio: 'ignore' and detached: true with child.unref() so the parent can exit.
  • Don’t block the event loop waiting for output. execSync exists. Don’t use it in a request handler.

Debugging & Performance

23

Debugging with --inspect

intermediate nodejs debugging devtools vscode

Console-log debugging works. Until it doesn’t. When we’re chasing a bug in async code with five awaits and a Promise.all, dropping breakpoints is way faster.

Node has a real debugger built in — same protocol Chrome DevTools uses. We just need to start Node with the right flag.

The two flags

  • --inspect — opens the debug port (default 127.0.0.1:9229). Code runs immediately.
  • --inspect-brk — same, but pauses on the very first line, waiting for a debugger to attach.
# Run normally with debugger available
node --inspect server.js

# Pause until DevTools attaches (good for debugging startup code)
node --inspect-brk server.js

In simple language: --inspect is “start running, I’ll attach whenever.” --inspect-brk is “wait for me before doing anything.”

We’ll see this in the terminal:

Debugger listening on ws://127.0.0.1:9229/abc-123
For help, see: https://nodejs.org/en/docs/inspector

Attach with Chrome DevTools

Open Chrome and go to chrome://inspect. Click Configure and make sure localhost:9229 is in the list. Our Node process shows up under “Remote Target” — click inspect.

You get full DevTools: Sources tab for breakpoints, Console for evaluating expressions in the current scope, Memory tab for heap snapshots, Performance tab for CPU profiles.

Attach with VS Code

This is the smoother workflow most of the time. Create .vscode/launch.json:

{
  "version": "0.2.0",
  "configurations": [
    {
      "type": "node",
      "request": "launch",
      "name": "Debug server",
      "program": "${workspaceFolder}/server.js",
      "skipFiles": ["<node_internals>/**"]
    },
    {
      "type": "node",
      "request": "attach",
      "name": "Attach to running",
      "port": 9229
    }
  ]
}

Two modes:

  • Launch — VS Code starts Node with --inspect-brk itself. Hit F5, done.
  • Attach — we start Node with --inspect ourselves (e.g. inside Docker), then VS Code connects to port 9229.

Set breakpoints by clicking in the gutter. Hit them by triggering the code path (curl a route, run a script). Use the debug console to evaluate req.body or whatever in the paused scope.

--inspect vs --inspect-brk
--inspect
Code starts running.
Attach anytime.
Debug a live server.
--inspect-brk
Pauses on line 1.
Waits for attach.
Debug startup/init code.

Debugging inside Docker

The inspect port binds to 127.0.0.1 by default — won’t be reachable from outside the container. Bind to 0.0.0.0 and expose the port:

node --inspect=0.0.0.0:9229 server.js
# docker-compose.yml
services:
  app:
    ports:
      - "9229:9229"

Now VS Code attach config with "port": 9229 works against the container.

Warning: never expose 9229 in production. Anyone who can reach that port has remote code execution on our server.

Useful tricks

  • Conditional breakpoints — right-click a breakpoint, set a condition like userId === 42. Stops only when it matters.
  • Logpoints — instead of pausing, log a message. Same effect as console.log but without editing code.
  • debugger statement — drop the keyword debugger; in our code. If a debugger is attached, execution pauses there. If not, no-op.
  • --inspect with nodemonnodemon --inspect server.js gives auto-restart + debugger together.

When console.log is still fine

Honestly, for a quick “is this code path even running” question, console.log is faster. The breakpoint workflow shines for:

  • Inspecting complex object state at a point in time
  • Stepping through async/await flow
  • Catching an exception at the throw site (enable “Pause on caught exceptions”)
  • Debugging a heisenbug we can’t reliably reproduce

24

Profiling & Heap Snapshots

advanced nodejs performance profiling memory

“Our API got slow” or “memory keeps climbing until OOM” are vague. Profiling turns them into “this regex is 60% of CPU time” or “we’re retaining 800k of these objects.”

There are three tools we’ll use: --prof for CPU, heap snapshots for memory, and clinic.js when we want pretty graphs without learning V8 internals.

CPU profiling with —prof

Run Node with --prof and it dumps a V8 tick log to a file like isolate-0xNNNN-v8.log. Then we process it into something readable.

# 1. Run app under load (use autocannon, k6, ab, etc. to generate traffic)
node --prof server.js

# 2. Stop the process, find the log
ls isolate-*-v8.log

# 3. Process into a flat profile
node --prof-process isolate-0x10800000-v8.log > profile.txt

The output looks like:

 [Summary]:
   ticks  total  nonlib   name
   1234   45.2%   60.1%   JavaScript
    412   15.1%   20.0%   C++
    ...

 [JavaScript]:
   ticks  total  nonlib   name
    389   14.2%   18.9%   LazyCompile: *parseRequest /app/server.js:42
    201    7.3%    9.7%   LazyCompile: *hashPassword /app/auth.js:18

In simple language: each “tick” is a sample of “what was the CPU doing right now?” The function with the most ticks is the hot spot.

We’re looking for surprises. “Why is JSON.parse 40% of our time?” or “Why does bcrypt show up — isn’t that supposed to be async?”

CPU profiling via DevTools (nicer)

Run with --inspect, attach Chrome DevTools, go to the Performance tab, hit record, run the load, stop. We get a flame graph with function names, time spent, and we can drill in.

This is usually friendlier than reading --prof-process output. Same data, prettier.

Heap snapshots — for memory issues

A heap snapshot is “freeze the current state of memory, list every object.” We take two snapshots — one before something, one after — and diff them to find what got allocated but never freed.

How to take one:

import { writeHeapSnapshot } from 'node:v8';

// Programmatic
const file = writeHeapSnapshot();
console.log('snapshot saved to', file);

Or via DevTools: attach with --inspect, go to the Memory tab, click Take snapshot.

Memory Leak Hunt
1. Take snapshot A (baseline, app idle)
↓ run suspect workload for 5 min
2. Take snapshot B
3. DevTools: "Comparison" view, sort by Delta. What grew?
↓ click a suspicious class
4. "Retainers" panel shows what's holding the reference

The retainers chain is the magic part. It tells us “this 50MB Map is retained by globalCache in cache.js:12.” Now we know exactly which line to fix.

Clinic.js — easy mode

Writing autocannon scripts and reading flame graphs is fine, but clinic.js packages this nicely.

npm i -g clinic autocannon

# CPU + event loop analysis
clinic doctor -- node server.js
# In another terminal: autocannon -c 100 http://localhost:3000

# CTRL+C the server, browser opens with a report

Three sub-commands worth knowing:

  • clinic doctor — high-level “is the bottleneck CPU, I/O, event loop, or GC?”
  • clinic flame — flame graph of CPU hot paths
  • clinic bubbleprof — async operation timing (shows where awaits stall)

doctor is the right starting point — it tells us which other tool to reach for next.

What to look for

  • CPU profile — any single function dominating? Often a regex, JSON serialization, sync crypto, or accidentally-quadratic code.
  • Heap snapshot diff — any class with thousands of instances that should be temporary? Look for Closure, (string), Array with huge retained size.
  • Event loop lag — clinic.doctor flags it red. Means we’re doing too much sync work between I/O.
  • GC pressure — if “GC” is a big slice in the CPU profile, we’re allocating too aggressively. Reuse buffers, avoid hot-path .map().filter().reduce() chains.

Production profiling

Don’t run --prof 24/7 — it has overhead. Instead:

  • Enable --inspect on a non-public port and attach when needed.
  • Use process.memoryUsage() and perf_hooks to log metrics continuously, profile deeply only when alerts fire.
  • For really gnarly issues, take a snapshot in prod, download it, analyze locally in DevTools.

25

Memory Leaks

advanced nodejs memory performance leaks

A memory leak in Node is when our process keeps holding on to memory it doesn’t need anymore. RSS climbs. Eventually we hit the heap limit (default ~1.5 GB on 64-bit) and V8 kills us with JavaScript heap out of memory.

JavaScript has a garbage collector — it frees objects nothing references. So a “leak” really means we’re still referencing the object even though we don’t need it. Find the reference, break it, leak fixed.

The classic causes

1. Closures over big data

function buildHandler(hugeDataset) {
  return function (req, res) {
    res.json({ count: hugeDataset.length });
  };
}

app.get('/count', buildHandler(loadGigabyteFile()));

The handler captures hugeDataset. As long as the handler is registered (forever), the dataset stays in memory. Even if we only ever read .length from it.

Fix: don’t close over data we don’t need.

const count = loadGigabyteFile().length; // extract what we need
app.get('/count', (req, res) => res.json({ count }));
// hugeDataset can be GC'd now

2. EventEmitter listener leaks

Every .on() adds a listener. If we add listeners in a request handler without removing them, they pile up forever.

// BAD — adds a listener per request
app.get('/stream', (req, res) => {
  someEmitter.on('data', (chunk) => res.write(chunk));
});

Node warns us once we cross 10 listeners on the same event:

(node:1234) MaxListenersExceededWarning: Possible EventEmitter memory leak detected.
11 data listeners added to [EventEmitter].

Fixes:

  • Use .once() if we only need it once.
  • Remove the listener when we’re done: emitter.off('data', fn).
  • For per-request listeners, attach to a per-request object (the response stream), not a shared global emitter.

3. Unbounded global caches

const cache = new Map();
app.get('/user/:id', async (req, res) => {
  if (!cache.has(req.params.id)) {
    cache.set(req.params.id, await db.getUser(req.params.id));
  }
  res.json(cache.get(req.params.id));
});

Looks innocent. After a million unique user IDs, our Map has a million entries. Forever.

Fix: bounded cache with TTL/LRU.

import { LRUCache } from 'lru-cache';
const cache = new LRUCache({ max: 10_000, ttl: 1000 * 60 * 5 });

4. Timers that capture context

function handleConnection(conn) {
  setInterval(() => conn.ping(), 30_000);
}

If conn disconnects but we never clearInterval, the timer keeps conn alive. Always store the timer ID and clear it on cleanup.

5. Global arrays we push to and never drain

const recentRequests = [];
app.use((req, res, next) => {
  recentRequests.push({ url: req.url, time: Date.now() });
  next();
});

Grows forever. Use a ring buffer, or push to a real log system.

Spotting a leak

The telltale sign: RSS climbs steadily under steady load and never comes back down. A healthy process has memory that goes up during traffic, then GC reclaims it during quiet periods, oscillating in a band. A leaking process trends up monotonically.

// Cheap monitoring
setInterval(() => {
  const m = process.memoryUsage();
  console.log({
    rss: (m.rss / 1024 / 1024).toFixed(1) + 'MB',
    heapUsed: (m.heapUsed / 1024 / 1024).toFixed(1) + 'MB',
  });
}, 10_000);
Leak Detection Workflow
1. Reproduce — script that drives the suspect path in a loop
2. Take heap snapshot (baseline, after warmup)
3. Run loop for N minutes
4. Take second snapshot
5. DevTools → Comparison view → sort by Delta
6. Open the top growing class → "Retainers" → follow chain
7. Fix the reference. Re-test.

Force GC for cleaner snapshots

V8 might be holding objects that are technically collectable. Force a GC right before snapshotting:

node --expose-gc server.js
if (global.gc) global.gc();
// now take snapshot

Otherwise we end up chasing “leaks” that are really just GC laziness.

When it’s not actually a leak

  • First few minutes of high traffic — V8’s heap grows up to its working set. Normal.
  • heapTotal grows but heapUsed stays flat — heap fragmentation, not a leak.
  • Native memory growth — RSS grows but heap doesn’t. Could be a native addon (sharp, bcrypt, gRPC) leaking C++ memory. Way harder to debug.

The boring fix nobody talks about

If we can’t find the leak in a hurry and the process is going to OOM in 12 hours, restart it on a schedule. PM2’s max_memory_restart or Kubernetes’ liveness probe + memory limit will recycle the process before it dies. Not glamorous but buys us time to actually fix it.


Production

26

Error Handling Patterns

intermediate nodejs errors async production

Error handling in Node is a minefield because there are three different error-delivery mechanisms: thrown exceptions, callback’s err first argument, and rejected Promises. Mix them up and errors silently disappear.

The async/await rule

With async/await, errors propagate via thrown exceptions — same as sync code. try/catch catches them.

async function getUser(id) {
  try {
    const user = await db.findUser(id);
    return user;
  } catch (err) {
    logger.error({ err, id }, 'failed to load user');
    throw err; // re-throw, let caller decide
  }
}

Key word: re-throw. Catching to log and then returning undefined (silently swallowing) is how bugs hide for months. Either re-throw, or return a sentinel and document it loudly.

Promises without await

If we fire a promise and don’t await it (or .catch it), a rejection becomes an unhandled promise rejection. Bad.

// BAD
async function handler(req, res) {
  doBackgroundWork(); // returns a promise, we ignored it
  res.json({ ok: true });
}

If doBackgroundWork throws, the error vanishes. Either await it, or chain a .catch:

doBackgroundWork().catch((err) => logger.error({ err }, 'bg work failed'));

The two process-level safety nets

Node fires these events for errors we missed:

process.on('uncaughtException', (err, origin) => {
  logger.fatal({ err, origin }, 'uncaught exception');
  // do minimal sync cleanup, then EXIT
  process.exit(1);
});

process.on('unhandledRejection', (reason, promise) => {
  logger.error({ reason }, 'unhandled rejection');
  // In modern Node, these are fatal by default. Crash.
  throw reason;
});

In simple language: uncaughtException is “a thrown error nobody caught.” unhandledRejection is “a rejected promise nobody .catched.” Both mean we have a bug somewhere.

To crash or not to crash?

This is the interview question. The answer: on uncaughtException, always crash.

Why? After an uncaught exception, our process is in an undefined state. Half-completed transactions. Half-closed file descriptors. Variables in inconsistent state. Continuing to serve traffic could corrupt data.

The correct flow:

Error Decision Tree
Operational error (DB timeout, 404, invalid input, network blip)
→ catch, log, return error response. Keep running.
Programmer error (TypeError, ReferenceError, "cannot read property of undefined")
→ crash. Process manager restarts us. Fix the bug.
Out of memory
→ already crashing. Make sure restart is configured.

Operational errors = expected, recoverable. Programmer errors = bugs, unrecoverable mid-flight. Joyent’s classic article codified this distinction; it’s still the right model.

Express/Koa pattern

In Express 4, async route handlers don’t auto-forward rejected promises. Wrap them.

const asyncHandler = (fn) => (req, res, next) => {
  Promise.resolve(fn(req, res, next)).catch(next);
};

app.get('/users/:id', asyncHandler(async (req, res) => {
  const user = await db.findUser(req.params.id);
  if (!user) throw new NotFoundError('user');
  res.json(user);
}));

// Central error middleware
app.use((err, req, res, next) => {
  logger.error({ err, url: req.url }, 'request failed');
  const status = err.status || 500;
  res.status(status).json({ error: err.message });
});

Express 5 (now stable) auto-forwards async errors. One less footgun.

Custom error classes

Use error subclasses to tell apart “expected” errors from genuine bugs.

class AppError extends Error {
  constructor(message, status = 500) {
    super(message);
    this.name = this.constructor.name;
    this.status = status;
    this.isOperational = true;
  }
}

class NotFoundError extends AppError {
  constructor(resource) {
    super(`${resource} not found`, 404);
  }
}

// In the error middleware
if (!err.isOperational) {
  logger.fatal({ err }, 'non-operational error — restarting');
  process.exit(1);
}

Streams and EventEmitters

Streams emit 'error'. If nobody listens, Node crashes the process. Always attach:

fs.createReadStream('big.csv')
  .on('error', (err) => logger.error({ err }, 'read failed'))
  .pipe(transform)
  .on('error', (err) => logger.error({ err }, 'transform failed'));

Or better — use stream.pipeline which propagates errors cleanly:

import { pipeline } from 'node:stream/promises';

try {
  await pipeline(fs.createReadStream('in.csv'), transform, fs.createWriteStream('out.csv'));
} catch (err) {
  logger.error({ err }, 'pipeline failed');
}

Checklist

  • Wrap every async route handler so rejections reach error middleware.
  • Have a central error logger — never console.error and move on.
  • Subscribe to uncaughtException and unhandledRejection, log, then exit.
  • Run under a process manager (PM2, systemd, Docker restart policy) so crash → restart is fast.
  • Distinguish operational from programmer errors — recover from the first, crash on the second.

27

Logging

intermediate nodejs logging production observability

console.log is great in dev. In production it’s a disaster:

  • Blocks the event loop on a slow terminal/file.
  • No log levels — can’t filter “warn and above.”
  • Unstructured strings — grep works but querying (“show me all 500s in the last hour”) doesn’t.
  • No timestamps unless we add them manually.
  • No request correlation — can’t follow one request across many log lines.

In simple language: console.log is a printf, not a logger. We need a logger.

Structured (JSON) logs > free-text

Pre-cloud: tail -f app.log | grep ERROR. Post-cloud: logs go to Datadog/Loki/CloudWatch/ELK and get queried. Those systems work way better with JSON.

// Free-text — hard to query
2026-05-26 12:34:56 ERROR: user 42 failed login from 1.2.3.4

// Structured — every field is queryable
{"level":"error","time":1716720896000,"msg":"failed login","userId":42,"ip":"1.2.3.4"}

Now level:error AND userId:42 is a one-liner in any log system.

Pino — fast and JSON-first

Pino is the de facto standard for new Node services. Async, structured by default, very fast (claims ~5x faster than Winston in their benchmarks).

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
});

logger.info('server started');
logger.info({ port: 3000, env: 'prod' }, 'listening');
logger.error({ err, userId }, 'failed to load user');

Note the argument order: object first, message second. The object’s keys become top-level fields in the log line.

Output:

{"level":30,"time":1716720896,"pid":42,"hostname":"app-1","port":3000,"env":"prod","msg":"listening"}

level: 30 is info (10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal).

Child loggers for request context

app.use((req, res, next) => {
  req.log = logger.child({ reqId: crypto.randomUUID() });
  next();
});

app.get('/users/:id', async (req, res) => {
  req.log.info({ userId: req.params.id }, 'fetching user');
  // every log line in this request includes reqId automatically
});

Now we can trace one request through five log lines by filtering on reqId.

Pretty-print in dev

Raw JSON in dev is ugly. Pipe through pino-pretty:

node server.js | pino-pretty

Or configure it conditionally:

const logger = pino({
  transport: process.env.NODE_ENV !== 'production'
    ? { target: 'pino-pretty' }
    : undefined,
});

Winston — flexible, more batteries included

Winston has been around longer. More transports out of the box (files, HTTP, Slack, Loggly). More configurable formatters. Slower than Pino but rarely the bottleneck.

import winston from 'winston';

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
  ],
});

Pino vs Winston

Pino Winston
SpeedVery fastSlower
Default formatJSONConfigurable
TransportsWorker-thread basedBuilt-in zoo
Best forNew services, microservices, high throughputLegacy projects, complex routing needs

For new projects in 2026, default to Pino unless we have a specific reason for Winston.

Log levels — actually use them

  • trace — extremely verbose, “entered this function”
  • debug — dev-only details
  • info — normal lifecycle events (“server started”, “job processed”)
  • warn — something unexpected but not broken
  • error — request failed, operation failed
  • fatal — process is dying

Default to info in prod, debug in dev. Letting debug lines into prod logs makes them noisy and expensive.

What to log (and what NOT to)

Log:

  • Every incoming request (URL, status, latency, requestId, userId)
  • Every error with stack trace
  • Job/cron start and end
  • External API calls (target, duration, status)

Never log:

  • Passwords, tokens, API keys, session IDs
  • Full credit card numbers, PII (unless legally OK and you have redaction)
  • The full request body of every request (huge volume, often contains PII)

Use a redaction config:

const logger = pino({
  redact: ['req.headers.authorization', 'password', '*.token'],
});

Where logs go

In containerized environments (Docker, Kubernetes), log to stdout/stderr only. Don’t write log files inside the container. The orchestrator captures stdout and routes it to your log backend. Files inside containers vanish when the container restarts.

For VMs/bare metal, write to stdout and let systemd-journald / a sidecar agent ship them off.


28

Process Managers (PM2)

intermediate nodejs pm2 deployment production

If we run node server.js directly and it crashes, that’s it. The process is dead. A process manager solves this — it babysits our app, restarts it on crash, runs it as multiple workers, captures logs, and gives us a CLI to inspect everything.

PM2 is the most popular for Node. It’s not the only option (systemd, Docker restart policies, Kubernetes), but it’s the easiest to get going for a single VM.

What PM2 actually does

In simple language: PM2 is “a daemon that runs your Node apps and makes sure they stay running.”

Specifically:

  • Auto-restart on crash (with backoff)
  • Restart on memory threshold (max_memory_restart)
  • Cluster mode (forks N copies, load balances)
  • Log rotation and aggregation
  • Zero-downtime reload
  • pm2 startup hooks into systemd so PM2 survives reboots

Basic usage

npm i -g pm2

# Start an app
pm2 start server.js --name api

# See status
pm2 list

# Tail logs
pm2 logs api

# Restart / stop / delete
pm2 restart api
pm2 stop api
pm2 delete api

# Persist current process list across reboots
pm2 save
pm2 startup   # prints a sudo command — run it

Cluster mode — free horizontal scaling

-i max runs one instance per CPU core. Same idea as the cluster module, just declarative.

pm2 start server.js -i max --name api

PM2 handles the master process for us. Each worker is a real Node process with its own memory. Use this when our HTTP server is CPU-bound and we want to use all cores on one machine.

PM2 God Daemon (always running)
manages
api (cluster, 4 workers)
restarts: 2 · uptime: 4d
cron-worker (fork, 1)
restarts: 0 · uptime: 4d
queue-worker (fork, 2)
restarts: 1 · uptime: 3d

ecosystem.config.cjs — config as code

For anything beyond a one-liner, put settings in an ecosystem file. Then pm2 start ecosystem.config.cjs.

module.exports = {
  apps: [
    {
      name: 'api',
      script: './server.js',
      instances: 'max',
      exec_mode: 'cluster',
      max_memory_restart: '500M',
      env: {
        NODE_ENV: 'production',
        PORT: 3000,
      },
      error_file: './logs/api-err.log',
      out_file: './logs/api-out.log',
      time: true,
    },
    {
      name: 'cron-worker',
      script: './cron.js',
      instances: 1,
      exec_mode: 'fork',
      autorestart: true,
    },
  ],
};

Key options:

  • instances: 'max' + exec_mode: 'cluster' — one worker per core
  • max_memory_restart: '500M' — restart if worker exceeds 500MB (band-aid for leaks)
  • autorestart: true — default, restart on crash
  • cron_restart: '0 4 * * *' — restart at 4 AM daily (rarely needed but useful for leaky processes)

Zero-downtime reload

pm2 reload api

In cluster mode, this restarts workers one at a time. Each old worker keeps serving until the new one is ready, then it shuts down. No dropped requests if our app handles SIGINT/SIGTERM properly (graceful shutdown — covered in the next note).

pm2 restart is different: it kills and restarts. Brief downtime.

PM2 vs systemd vs Docker

PM2 systemd Docker / K8s
Setupnpm i -g, doneWrite a unit fileDockerfile + compose / manifest
Cluster modeBuilt-in, freeManual (multiple units)Scale replicas
Logspm2 logsjournalctldocker logs / k8s
Reload w/o downtimeYes (cluster)No (needs LB)Yes (rolling deploy)
Best forSingle VM, fast iterationLinux servers, no containersMulti-host, microservices

Rule of thumb:

  • Just one VM, want something working today → PM2
  • VM, prefer OS-native, don’t want extra runtime → systemd unit
  • Already on Docker/Kubernetes → don’t use PM2, let the orchestrator restart containers. PM2 inside Docker is a common anti-pattern; the container should be the unit of restart.

PM2 gotchas

  • PM2 in Docker is usually wrong. Docker already restarts containers. Running PM2 inside hides crashes from Docker and complicates log capture. One container = one Node process.
  • pm2 startup setup is mandatory. Without it, a server reboot kills our apps. Run pm2 startup once, then pm2 save after every change to the process list.
  • Logs grow forever. Install pm2-logrotate (pm2 install pm2-logrotate) or use logrotate.
  • PM2’s free version doesn’t ship metrics. Keymetrics (their paid SaaS) does. For free, scrape pm2 jlist or expose your own metrics.

29

Graceful Shutdown

intermediate nodejs shutdown signals production docker

When Docker, Kubernetes, or PM2 wants to stop our app — for a deploy, a scale-down, or a node drain — they send SIGTERM. If our app ignores it, after a grace period (10 seconds for Docker, 30 for K8s) they send SIGKILL and we get killed mid-request.

That means: dropped HTTP requests, half-committed DB writes, lost jobs. In production, this is unacceptable.

Graceful shutdown is “react to SIGTERM, finish what we’re doing, then exit cleanly.”

The lifecycle

Graceful Shutdown Timeline
t=0 · Orchestrator sends SIGTERM
t=0+ · Stop accepting new connections (server.close())
t=0+ · Health check starts returning 503 → LB stops sending traffic
t=0..N · In-flight requests finish naturally
t=N · Close DB pool, Redis, message queue connections
t=N+ε · process.exit(0)
t=30s · Hard timeout — force exit if still alive (avoid SIGKILL)

A minimal Express implementation

import express from 'express';
import { pool } from './db.js';

const app = express();
app.get('/', async (req, res) => {
  await new Promise((r) => setTimeout(r, 2000)); // slow handler
  res.send('hi');
});

const server = app.listen(3000, () => console.log('listening on 3000'));

let shuttingDown = false;

// Health check that flips on shutdown
app.get('/healthz', (req, res) => {
  if (shuttingDown) return res.status(503).send('shutting down');
  res.send('ok');
});

async function shutdown(signal) {
  if (shuttingDown) return;
  shuttingDown = true;
  console.log(`${signal} received, shutting down`);

  // 1. Stop accepting new connections
  server.close((err) => {
    if (err) console.error('server.close error', err);
    console.log('http server closed');
  });

  // 2. Wait for in-flight, then close downstream resources
  // (server.close() waits for existing connections to finish)
  try {
    await pool.end();        // close pg pool
    // await redis.quit();   // close redis, etc.
    console.log('db closed');
  } catch (err) {
    console.error('cleanup error', err);
  }

  // 3. Hard timeout — if something's stuck, give up before SIGKILL hits
  setTimeout(() => {
    console.error('forced exit after 25s');
    process.exit(1);
  }, 25_000).unref();
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));   // Ctrl+C in dev

A few things worth calling out:

  • server.close() doesn’t kill existing connections. It stops accept() for new ones and waits for the current ones to finish. Exactly what we want.
  • Health check flips first. The load balancer needs a few seconds to notice we’re unhealthy and route traffic elsewhere. If we close the server immediately, the LB might send us one more request that hits a closed socket.
  • .unref() on the timeout. So the timer itself doesn’t keep the process alive if everything else finishes early.

The “stop accepting + drain” dance

In simple language: we’re telling the world “no more orders please” while still cooking the orders we already accepted. Once the kitchen is clear, we close up shop.

For long-lived connections (WebSockets, SSE), server.close() waits forever because those connections never end on their own. We have to actively tell clients to disconnect:

// For WebSockets
for (const ws of wsServer.clients) {
  ws.close(1001, 'server restarting');
}

For HTTP keep-alive, idle connections can hang around. Use the http-terminator library or set server.closeIdleConnections() (Node 18.2+) to forcibly close idle keep-alive sockets.

Why Docker/Kubernetes need this

Docker sends SIGTERM to PID 1 in the container, waits --stop-timeout (default 10s), then SIGKILL.

Kubernetes sends SIGTERM, waits terminationGracePeriodSeconds (default 30s), then SIGKILL.

If our Node app is PID 1 (running directly via CMD ["node", "server.js"]), we receive the signal. Done.

But if we use a shell form (CMD node server.js), the shell becomes PID 1 and does not forward signals. Our Node process never gets SIGTERM, falls to SIGKILL, drops requests. Bad.

Fix: always use exec form in Dockerfile.

# BAD — shell form
CMD node server.js

# GOOD — exec form, Node is PID 1
CMD ["node", "server.js"]

Or use tini / dumb-init as PID 1 if we need signal forwarding (e.g. when running via npm).

Kubernetes preStop hook

K8s has a subtle race: when a pod is terminated, the SIGTERM is sent at roughly the same time the pod is removed from the Service endpoints list. For a few seconds, traffic might still hit a shutting-down pod.

The fix is a preStop hook that sleeps before the signal is sent:

lifecycle:
  preStop:
    exec:
      command: ["sleep", "5"]

5 seconds is usually enough for the endpoints update to propagate. Our app keeps serving normally during the sleep, then gets SIGTERM and shuts down cleanly.

Common mistakes

  • No timeout. A stuck DB connection hangs shutdown() forever, then SIGKILL kills us. Always have a hard timeout that beats the orchestrator’s.
  • Closing the DB pool before HTTP finishes. Now in-flight requests can’t query the DB and fail. Order matters: HTTP first, then resources.
  • Catching SIGTERM but doing nothing. Worse than not handling it — Node’s default is to exit, our handler overrides that.
  • PM2 cluster reload — same story. PM2 sends SIGINT to each worker. If we don’t handle it, reload drops requests.
  • Running with nodemon or a shell wrapper in prod. They eat the signal. Use the runtime directly or tini.