Node.js

All 29 notes on one page

Fundamentals

What is Node.js

beginner nodejs v8 libuv runtime

Node.js is a JavaScript runtime that lets us run JS outside the browser. Before Node, JavaScript only ran inside browsers. Ryan Dahl built Node in 2009 so we could use the same language on the server too.

In simple language — Node.js takes the V8 engine out of Chrome, glues it to a C library called libuv, and gives us file system access, networking, child processes, and everything else a server needs.

The two main pieces

Node is essentially two things stitched together:

V8 — Google’s JavaScript engine (the same one Chrome uses). It compiles JS to native machine code. This is what actually runs our code.
libuv — a C library that handles non-blocking I/O. It gives us the event loop, the thread pool, and async file/network operations.

Everything else (the fs, http, crypto modules, etc.) is a thin layer on top of these two.

Node.js Architecture

Your code

app.js, server.js, ...

Node.js Core (JS)

fs, http, stream, crypto, path...

runs JS, GC, JIT

libuv

event loop, thread pool, I/O

Non-blocking I/O is the big idea

Most server work is waiting — for a database, for a file, for an HTTP response. Traditional servers spin up a thread per request. Node uses one thread and an event loop. When we call fs.readFile(), Node hands the work off to libuv, returns immediately, and runs other code. When the file is ready, our callback runs.

The only thread that runs our JavaScript is single. But the I/O happens in parallel under the hood. That’s why Node feels fast for I/O-heavy workloads.

const fs = require("node:fs");

console.log("1. starting");

fs.readFile("./data.json", "utf8", (err, data) => {
  console.log("3. file read done");
});

console.log("2. continuing without waiting");
// Output order: 1, 2, 3

What we actually use Node for

HTTP / REST APIs — Express, Fastify, NestJS
Real-time apps — WebSockets, chat, multiplayer games (great fit because of the event loop)
CLI tools — npm itself, ESLint, Prettier, Vite are all Node CLIs
Build tooling — bundlers, transpilers, test runners
Microservices — small, fast HTTP services
Streaming pipelines — log processing, file transforms

Where Node is NOT a great fit

Node has one main thread for JS. CPU-heavy work (video encoding, ML inference, big math loops) blocks that thread and stalls everything. For CPU-bound work, we’d reach for Go, Rust, or Python with native libs — or offload to worker threads within Node.

Quick history

2009 — Ryan Dahl releases Node
2010 — npm launches
2015 — io.js fork merges back, Node 4.0 released, Node Foundation forms
2019 — Node Foundation + JS Foundation merge into OpenJS Foundation
2020+ — ES Modules become stable, node: protocol, built-in fetch, --test runner

Today Node powers Netflix, LinkedIn, PayPal, Uber’s API gateway, and basically every company’s tooling layer.

References

Event Loop Deep Dive

intermediate event-loop libuv async microtasks nexttick

The event loop is THE most asked Node.js interview question. In simple language — it’s the mechanism that lets a single-threaded runtime handle thousands of concurrent operations without blocking.

When we call setTimeout, fs.readFile, or a network request, Node hands the work to libuv and our callback gets queued. The event loop is the orchestrator that picks the right callback to run next.

The 6 phases

The event loop runs in a loop (obviously), and each tick of that loop goes through 6 phases in order. Each phase has its own callback queue.

One tick of the event loop

1timers — setTimeout, setInterval callbacks whose time is up

↓

2pending callbacks — some system errors (e.g. TCP ECONNREFUSED)

↓

3idle, prepare — internal only

↓

4poll — I/O callbacks (fs, net, ...). Waits here if nothing else to do.

↓

5check — setImmediate callbacks

↓

6close callbacks — socket.on('close'), etc.

↻ loops back to phase 1

The four we actually care about in interviews are timers, poll, check, and close.

Microtasks run BETWEEN phases

Here’s the key bit most people miss. Microtasks aren’t a phase. They run between every phase (and after every individual callback). There are two microtask queues:

process.nextTick queue — runs first, drained completely
Promise (then/catch/finally) queue — runs second, drained completely

So the real flow is: run a callback → drain nextTick queue → drain Promise queue → next callback. This is why process.nextTick can starve the event loop if we recurse on it.

setTimeout(() => console.log("timeout"), 0);
setImmediate(() => console.log("immediate"));

Promise.resolve().then(() => console.log("promise"));
process.nextTick(() => console.log("nextTick"));

console.log("sync");

// Output:
// sync
// nextTick
// promise
// timeout       (or immediate first — depends on context)
// immediate

setImmediate vs setTimeout(fn, 0)

Classic interview trap. Outside of an I/O callback, the order is non-deterministic — it depends on how fast Node enters the loop. Inside an I/O callback, setImmediate ALWAYS runs first (because we’re already past the timers phase, heading to check).

const fs = require("node:fs");

fs.readFile(__filename, () => {
  setTimeout(() => console.log("timeout"), 0);
  setImmediate(() => console.log("immediate"));
});
// Always: immediate, timeout

process.nextTick — use carefully

process.nextTick fires before any I/O or timer, after the current operation. It’s how we defer something to “right after this function but before anything else”.

Think of it like — “finish this stack, then immediately do this, before going back to the event loop”.

Recursive nextTick calls can block the loop forever. Recursive Promise resolutions have the same risk in newer Node (they share priority since Node 11+).

Why Node feels concurrent

While our JS is sync and single-threaded, libuv runs I/O on a 4-thread worker pool by default (configurable via UV_THREADPOOL_SIZE). File system ops, DNS lookups, and crypto use this pool. Network I/O uses the OS’s epoll/kqueue directly — no thread needed.

The thread pool finishes a job → pushes the callback into the right phase queue → event loop picks it up on the next tick.

Practical takeaway

Don’t do heavy CPU work on the main thread — it blocks every phase.
process.nextTick for “must run before any I/O”.
setImmediate for “run after current poll phase”.
queueMicrotask is the standard, cross-platform way to schedule a microtask (uses the Promise queue).
If our loop is lagging, check for sync code, big JSON.parse, or sync fs calls.

References

Non-blocking I/O

intermediate libuv async thread-pool io

Non-blocking I/O is the whole reason Node exists. In simple language — when our code asks for something slow (a file, a network call), Node doesn’t sit and wait. It hands the work off and continues running other code. When the slow thing finishes, our callback gets queued up.

Blocking vs non-blocking

A blocking call freezes the thread until it returns. A non-blocking call returns immediately and notifies us later.

const fs = require("node:fs");

// Blocking — stops everything until done
const data = fs.readFileSync("./big.json", "utf8");
console.log(data);

// Non-blocking — returns instantly, callback runs later
fs.readFile("./big.json", "utf8", (err, data) => {
  console.log(data);
});
console.log("this runs FIRST");

If we use the sync version in an HTTP handler, every request reading that file waits in line. That’s bad. The async version lets Node serve thousands of requests interleaved.

How libuv pulls this off

libuv (the C library Node uses for non-blocking I/O) uses two strategies depending on the operation:

OS-level async APIs — for network I/O (sockets), libuv uses epoll on Linux, kqueue on macOS/BSD, and IOCP on Windows. The OS itself notifies libuv when a socket is ready. Zero extra threads needed.
Thread pool — for things the OS doesn’t expose as async (file system on most platforms, DNS lookups, crypto, zlib), libuv uses a pool of worker threads. Default 4 threads, configurable via UV_THREADPOOL_SIZE (max 1024).

Where I/O actually happens

OS async (no thread)

TCP / UDP sockets, HTTP, pipes

epoll / kqueue / IOCP

Thread pool (4 by default)

fs, dns.lookup, crypto.pbkdf2, zlib

UV_THREADPOOL_SIZE

The flow of an async call

Take fs.readFile:

JS calls fs.readFile(path, cb).
Node passes the work to libuv.
libuv picks a worker thread, that thread calls read() syscalls.
Meanwhile, our main thread is free — it runs other JS, handles requests, whatever.
The worker thread finishes, hands the result back to libuv.
libuv queues our callback in the poll phase of the event loop.
Event loop reaches poll phase → runs our callback.

Why this makes Node fast (for I/O)

A traditional thread-per-request server (think old Apache) needs ~1MB of stack per thread. 10,000 connections = 10GB of RAM just for stacks. Node holds 10,000 connections on one thread with maybe a few hundred MB of memory. The bottleneck shifts from threads to actual work.

But — and this is important — Node is fast for I/O-bound work. For CPU-bound work (image processing, JSON parsing big payloads, cryptography in a tight loop), Node is no faster than anything else, and worse, the heavy code blocks all other requests.

Tuning the thread pool

If we’re doing heavy crypto or lots of fs work, the default 4 threads can become a bottleneck. Bump it:

UV_THREADPOOL_SIZE=16 node server.js

Don’t set this absurdly high — past your CPU core count it just causes context switching.

Worker threads — for CPU work

For genuinely CPU-heavy code, Node has the worker_threads module. These are real OS threads with their own V8 instance. We send messages between them. Use these for things like image resizing, parsing huge files, or running ML inference.

const { Worker } = require("node:worker_threads");
const w = new Worker("./heavy-task.js");
w.on("message", (result) => console.log(result));
w.postMessage({ payload: "..." });

The cardinal rule

Never block the event loop. No JSON.parse on a 50MB string, no fs.readFileSync in a request handler, no while loop computing primes. If we block the loop, every connection waits.

References

REPL & Node CLI

beginner repl cli debugging node-flags

The Node CLI is more than just node app.js. It’s an interactive playground, a debugger entry point, and a quick scripting tool. Knowing the useful flags saves us a lot of time.

REPL — Read Eval Print Loop

Type node with no arguments and we get an interactive JS shell. Same engine, same APIs as Node, but live.

$ node
Welcome to Node.js v20.11.0.
> 1 + 1
2
> const fs = require("node:fs")
undefined
> fs.readdirSync(".")
[ 'package.json', 'index.js', 'README.md' ]
> .exit

Handy for trying out an API, checking date math, or testing a regex without making a file.

REPL dot commands

Inside the REPL, commands starting with . are special:

.help — list all commands
.editor — multi-line editor mode (Ctrl+D to finish)
.load file.js — evaluate a file’s contents into the REPL
.save out.js — save the session to a file
.break / .clear — abandon current multi-line input
.exit (or Ctrl+D twice) — quit

Useful REPL tricks

_ holds the result of the last expression. _error holds the last thrown error.
Tab completion works on variables and properties.
Top-level await works — no need to wrap in an async function.

> await fetch("https://api.github.com")
> _.status
200

Common CLI flags

-e

execute a string of JS and exit

-p

like -e but prints the result

--watch

auto-restart on file change (Node 18.11+)

--inspect

open Chrome DevTools debugger on port 9229

--inspect-brk

same, but pause on first line

--env-file

load a .env file (Node 20.6+)

--test

run the built-in test runner

--require / -r

preload a module before script runs

--experimental-*

opt into unstable features (loaders, vm modules, ...)

-e and -p — quick one-liners

-e evals a string. -p does the same but prints the result. Great for tiny shell utilities.

# Get a UUID without installing anything
node -p "crypto.randomUUID()"
# d8e7c2a0-...

# Quickly check Node version programmatically
node -p "process.version"

# Read JSON from stdin and pretty-print
cat data.json | node -e "let s='';process.stdin.on('data',d=>s+=d).on('end',()=>console.log(JSON.stringify(JSON.parse(s),null,2)))"

—watch — built-in nodemon

Since Node 18.11, we don’t need nodemon for most cases. --watch restarts our process when watched files change.

node --watch server.js
# Watching for file changes...

We can also pass --watch-path=./src to scope it.

—inspect — debugging

Adds a debugger that DevTools can attach to. Open chrome://inspect in Chrome and we see our Node process. Or use VS Code’s “Attach to Node” config.

node --inspect server.js
# Debugger listening on ws://127.0.0.1:9229/...

node --inspect-brk server.js
# Same, but the program pauses on line 1 waiting for us to attach

—env-file — built-in dotenv

Node 20.6+ ships with a built-in .env loader. We no longer need the dotenv package for simple cases.

node --env-file=.env server.js

Inside our code, the variables show up on process.env like normal.

process.argv — reading CLI args

When we write our own CLIs, args come in via process.argv. The first two entries are the node binary and the script path.

// node greet.js Manish
console.log(process.argv);
// [ '/usr/bin/node', '/path/to/greet.js', 'Manish' ]

const name = process.argv[2];
console.log(`Hello, ${name}`);

For anything more than one arg, reach for node:util’s parseArgs (Node 18+) or the commander / yargs packages.

const { parseArgs } = require("node:util");

const { values } = parseArgs({
  options: {
    port: { type: "string", short: "p", default: "3000" },
    dev: { type: "boolean" },
  },
});

console.log(values); // { port: '8080', dev: true }

References

Modules & Package Management

CommonJS vs ES Modules

intermediate modules commonjs esm import require

Node has two module systems. CommonJS (CJS) is the original — require() and module.exports. ES Modules (ESM) is the standard from the JS spec — import and export. Knowing how they differ matters because mixing them up causes very real production bugs.

The two systems at a glance

CommonJS (CJS)

Default extension: .js (or .cjs)

Synchronous loading

require() / module.exports

__dirname, __filename available

No top-level await

Loaded by reading + wrapping in a function

ES Modules (ESM)

Default extension: .mjs (or .js with "type":"module")

Asynchronous loading

import / export

import.meta.url instead of __dirname

Top-level await works

Static graph — imports must be at top

How Node decides which system a file is

The rules in order:

File ends in .cjs → CommonJS.
File ends in .mjs → ESM.
File ends in .js → look at the nearest package.json:
- "type": "module" → ESM
- "type": "commonjs" or no type field → CommonJS

// package.json
{
  "type": "module"
}

With that, every .js file in the package is treated as ESM. If we still need a CJS file inside, we use .cjs.

Syntax side by side

// CommonJS
const fs = require("node:fs");
const { readFile } = require("node:fs/promises");

function greet(name) {
  return `Hello, ${name}`;
}

module.exports = { greet };
// or: module.exports.greet = greet;

// ES Modules
import fs from "node:fs";
import { readFile } from "node:fs/promises";

export function greet(name) {
  return `Hello, ${name}`;
}

// or default export:
// export default greet;

The gotchas

1. `__dirname` doesn’t exist in ESM

In CJS, __dirname and __filename are free variables. In ESM, they’re gone. We use import.meta.url:

import { fileURLToPath } from "node:url";
import { dirname } from "node:path";

const __filename = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);

Node 20.11+ added import.meta.dirname and import.meta.filename so we can skip the boilerplate.

2. ESM imports MUST include the extension

CJS lets us write require("./utils") and it tries .js, .json, .node. ESM is strict — we have to write ./utils.js. (Node 22+ has a --experimental-default-type flag and there’s ongoing work to relax this for node_modules.)

3. You can `require()` ESM (sometimes)

Until recently, require() of an ESM file threw ERR_REQUIRE_ESM. Node 22+ supports require() of synchronous ESM (no top-level await) under a flag, and Node 23+ enables it by default. Older versions force us to use dynamic import():

// In a CJS file, loading an ESM module:
async function load() {
  const mod = await import("./esm-module.mjs");
  mod.doStuff();
}

4. Named exports from CJS into ESM

Importing a CJS module from ESM gives us the whole module.exports as the default. Node tries to detect named exports too, but if it can’t (e.g., they’re set dynamically), we have to destructure manually:

// CJS package
import pkg from "lodash";
const { debounce } = pkg;
// or, if Node detects named exports:
import { debounce } from "lodash";

5. JSON imports need an attribute

import data from "./data.json" with { type: "json" };

Dual packages — supporting both

Library authors often ship both. The package.json exports field is how we tell Node which file to use:

{
  "name": "my-lib",
  "type": "module",
  "main": "./dist/index.cjs",
  "exports": {
    ".": {
      "import": "./dist/index.mjs",
      "require": "./dist/index.cjs"
    }
  }
}

This is called conditional exports. The import key wins for ESM consumers, require wins for CJS.

When to use which

For new code in 2026, prefer ESM. It’s the standard, bundlers prefer it, top-level await is great, tree-shaking works better. The only reason to stay on CJS is a large existing codebase or a hot-loaded plugin system that needs sync require.

References

require Resolution Algorithm

intermediate modules require resolution node_modules

When we write require("express"), how does Node actually find that file? The resolution algorithm is well-defined and worth knowing — it explains a lot of bugs (“why is it picking up the wrong version?”, “why does my monorepo break?”).

The big picture

In simple language, Node walks through a checklist for require(X):

Is X a core module (fs, http, path, …)? Use that.
Does X start with ./, /, or ../? Treat it as a file or directory path.
Otherwise, walk node_modules up the directory tree until found.

If none of these work, we get the famous Error: Cannot find module 'X'.

require("X") flowchart

Step 1: Is X a core module like "fs" or "node:http"? → return it.

↓ no

Step 2: Does X start with "./", "/", "../"? → resolve as file/dir path.

↓ no

Step 3: Walk up node_modules folders from current dir to root.

↓ found?

Load it. Otherwise throw MODULE_NOT_FOUND.

Core modules win first

If X matches a built-in name (fs, path, crypto, http, etc.), Node returns the built-in regardless of anything in node_modules. Since Node 16 we can prefix with node: to be explicit and immune to userland shadowing:

const fs = require("node:fs"); // always the built-in

Relative and absolute paths — LOAD_AS_FILE then LOAD_AS_DIRECTORY

For require("./utils"), Node tries in this order:

./utils                  exact path (if a file)
./utils.js
./utils.json
./utils.node             (compiled C++ addon)
./utils/package.json     read "main" field
./utils/index.js
./utils/index.json
./utils/index.node

This is why require("./utils") works when the file is utils.js — Node appends extensions for us.

node_modules tree walk

For a bare specifier like require("express"), Node walks up the directory tree, checking for node_modules/express at each level until it hits the filesystem root:

/Users/me/project/api/src/routes/users.js  ← calling from here

Checks:
  /Users/me/project/api/src/routes/node_modules/express
  /Users/me/project/api/src/node_modules/express
  /Users/me/project/api/node_modules/express
  /Users/me/project/node_modules/express        ← found! use this
  /Users/me/node_modules/express
  /Users/node_modules/express
  /node_modules/express

This is why monorepos work — packages at the root resolve from any subfolder. And it’s why a deeply nested duplicate of a package can shadow the root one.

Loading a node_modules package

Once Node finds node_modules/express/, it needs to pick an entry file. It reads package.json:

If exports field exists → use that (conditional exports, see below).
Else if main field exists → use that file.
Else fall back to index.js.

{
  "name": "express",
  "main": "./index.js"
}

The exports field changes things

Modern packages use exports, which is strict. It blocks access to internal files and supports conditions (import vs require, node vs browser).

{
  "exports": {
    ".": {
      "import": "./dist/esm/index.mjs",
      "require": "./dist/cjs/index.js"
    },
    "./utils": "./dist/utils.js"
  }
}

With exports, require("my-pkg/internal/private") throws — even if the file exists. This is module encapsulation.

Caching — modules load once

Node caches the resolved module by its absolute path. The second require("express") returns the same exports object as the first. The cache lives at require.cache.

console.log(require.cache);
// { '/abs/path/index.js': Module { ... } }

delete require.cache[require.resolve("./config")]; // force reload
const fresh = require("./config");

This is why a module’s top-level code runs once per process — not once per import.

Inspecting resolution

require.resolve() returns the resolved path without loading the module. Super useful when debugging “wait, which copy is it picking up?”:

console.log(require.resolve("express"));
// /Users/me/project/node_modules/express/index.js

To run with NODE_PATH extra search dirs, set the env var (rare, mostly used for global tools):

NODE_PATH=/usr/local/lib/node_modules node script.js

Common gotchas

Wrong version in a monorepo — a workspace’s own node_modules shadows a hoisted version. Run require.resolve to confirm.
Case sensitivity — works on macOS (case-insensitive FS), breaks on Linux. Always match casing exactly.
Symlinks — by default Node resolves to the real path. Use --preserve-symlinks for some monorepo setups.

References

package.json Fields

beginner package-json npm dependencies exports

package.json is the heart of any Node project. In simple language — it tells Node and npm everything about our project: name, version, what to run, what to install, and how others should import from us.

A real-world example:

{
  "name": "khoj",
  "version": "1.2.0",
  "description": "Personal job scraper",
  "type": "module",
  "main": "./dist/index.js",
  "exports": {
    ".": "./dist/index.js",
    "./utils": "./dist/utils.js"
  },
  "scripts": {
    "dev": "node --watch src/index.js",
    "test": "node --test",
    "build": "tsc"
  },
  "dependencies": {
    "axios": "^1.6.0",
    "pg": "^8.11.0"
  },
  "devDependencies": {
    "typescript": "^5.3.0"
  },
  "engines": {
    "node": ">=20"
  }
}

name and version

name is how npm and require find our package. Lowercase, no spaces, optionally scoped (@scope/name).

version follows semver — MAJOR.MINOR.PATCH. Increment major for breaking changes, minor for new features, patch for fixes.

type — CJS or ESM?

"type": "module" → all .js files are ESM
"type": "commonjs" (or absent) → all .js files are CJS

This setting controls how Node loads our files. See the CommonJS vs ESM note for details.

main, module, exports — the entry points

These three control what consumers get when they import or require our package.

main — the classic entry. Used by CJS require() and as the fallback.
module — bundler-only field (Webpack, Rollup). Points to an ESM build. Node ignores this.
exports — modern, strict, conditional. Beats main if present.

{
  "main": "./dist/index.cjs",
  "module": "./dist/index.mjs",
  "exports": {
    ".": {
      "types": "./dist/index.d.ts",
      "import": "./dist/index.mjs",
      "require": "./dist/index.cjs"
    },
    "./package.json": "./package.json"
  }
}

With exports, anything not listed is blocked — require("my-pkg/internal/secret") throws. This is intentional; it gives us module encapsulation.

scripts — our project’s commands

Anything in scripts runs via npm run <name> (or yarn <name>, pnpm <name>). A few names are special:

start — runs with just npm start
test — runs with just npm test
pre<x> / post<x> — auto-run before/after script <x>

{
  "scripts": {
    "dev": "node --watch src/index.js",
    "build": "tsc -p tsconfig.json",
    "test": "node --test test/",
    "lint": "eslint src/",
    "prebuild": "rm -rf dist"
  }
}

npm run sets node_modules/.bin on the PATH, so we can invoke locally-installed CLIs like tsc or eslint without a global install.

The three dependency buckets

dependencies

Needed at runtime. Installed when someone installs our package. Example: express, axios.

devDependencies

Needed only during development. Skipped with npm install --production. Example: typescript, eslint, vitest.

peerDependencies

"I work with this — please provide it." Common in plugins (eslint plugins, react libraries). Not auto-installed in npm v7+ for legacy reasons but they are by default in modern npm.

{
  "dependencies": {
    "express": "^4.18.0"
  },
  "devDependencies": {
    "@types/express": "^4.17.0",
    "typescript": "^5.0.0"
  },
  "peerDependencies": {
    "react": ">=18"
  }
}

Version range syntax

^1.2.3 — compatible (≥1.2.3, <2.0.0). The default with npm install.
~1.2.3 — patch-level only (≥1.2.3, <1.3.0).
1.2.3 — exact.
* or latest — anything (don’t do this).
>=1.2.3 <2.0.0 — explicit range.

engines — declare runtime requirements

Tells installers what Node version we need. Without engine-strict it’s just a warning, but it shows up in errors and documents intent:

{
  "engines": {
    "node": ">=20.0.0",
    "npm": ">=10.0.0"
  }
}

Many CI systems and platforms (Vercel, Render) read this to pick the right Node version.

Other useful fields

bin — declare CLI executables. npm install -g my-cli puts these on PATH.
files — whitelist of files to publish. Without it, npm uses .npmignore or includes everything.
workspaces — array of paths or globs for monorepo sub-packages.
private: true — prevents accidental npm publish.
sideEffects: false — bundler hint for tree-shaking. Means “imports of this package have no side effects, drop unused exports”.

{
  "bin": {
    "my-cli": "./cli.js"
  },
  "files": ["dist/", "README.md"],
  "private": true,
  "sideEffects": false
}

Generating it

npm init -y makes a minimal one. From there, every npm install <pkg> updates dependencies automatically.

References

npm vs yarn vs pnpm

intermediate npm yarn pnpm package-manager lockfile

All three install packages from the npm registry. They differ in how they install — disk layout, speed, strictness, and the lockfile format. Picking one matters more than people think.

The big idea behind each

npm — the original. Ships with Node. Uses a hoisted, flat node_modules. Lockfile: package-lock.json.
yarn — Facebook’s reaction to slow npm. Classic v1 uses hoisted layout like npm. Modern Yarn (Berry, v2+) uses Plug’n’Play (PnP) — no node_modules at all. Lockfile: yarn.lock.
pnpm — performant npm. Uses a global content-addressable store + hardlinks + a nested-but-symlinked node_modules. Lockfile: pnpm-lock.yaml.

The install layout difference

This is the key bit. Same package.json → very different folders.

npm / yarn classic — hoisted

node_modules/
├── express/
├── lodash/         ← hoisted up
├── debug/          ← hoisted up
└── pg/
    └── node_modules/
        └── pg-types/

Anything in node_modules root is requirable, even if not in package.json (phantom deps).

pnpm — content-addressable store

~/.pnpm-store/    (global, hashed files)
node_modules/
├── express → .pnpm/express@4.18/...
├── pg → .pnpm/pg@8.11/...
└── .pnpm/
    ├── express@4.18.0/node_modules/express/
    └── lodash@4.17.21/node_modules/lodash/

Only direct deps are at root. Strict — no phantom deps. Files are hardlinks to the global store.

Why pnpm is so much faster (and uses less disk)

pnpm keeps one copy of each package version in a global store (~/.pnpm-store). When we install, it creates hardlinks from node_modules to that store. Hardlinks share the same disk blocks — basically zero copy.

So 50 projects all using react@18.2.0 share one copy on disk. With npm, each project has its own full copy. On a dev laptop, this saves tens of GB.

# Install in current project
pnpm install

# See the store size
pnpm store path
# /Users/me/Library/pnpm/store/v3

Strictness — phantom dependencies

This is where pnpm beats npm in code correctness. With npm’s flat layout, our code can require("debug") even if we never listed debug in our package.json — because some transitive dependency installed it and hoisting flattened it to the top.

// works with npm if any dep depends on lodash, even if WE don't:
const _ = require("lodash"); // phantom dep!

The day that transitive package upgrades and drops lodash, our code breaks. pnpm’s symlink-based layout makes this impossible — we can only require what we explicitly declared.

Lockfiles — three formats, same purpose

A lockfile records the exact version of every package (direct and transitive) so we get the same install on every machine, every time.

# npm
package-lock.json     # JSON, very verbose

# yarn
yarn.lock             # custom format, more compact

# pnpm
pnpm-lock.yaml        # YAML

Always commit the lockfile. Without it, CI may install different transitive versions than dev — leading to “works on my machine” bugs.

For CI we use the strict install variants which fail if lockfile and package.json disagree:

npm ci             # strict install, deletes node_modules first
yarn install --immutable
pnpm install --frozen-lockfile

Common commands side by side

action

npm

yarn

pnpm

install all

npm install

yarn

pnpm install

add dep

npm i pkg

yarn add pkg

pnpm add pkg

add dev

npm i -D pkg

yarn add -D pkg

pnpm add -D pkg

remove

npm rm pkg

yarn remove pkg

pnpm rm pkg

run script

npm run x

yarn x

pnpm x

CI strict

npm ci

--immutable

--frozen-lockfile

Workspaces / monorepos

All three support workspaces — multiple packages in one repo.

// package.json at root
{
  "workspaces": ["packages/*", "apps/*"]
}

pnpm uses pnpm-workspace.yaml instead:

packages:
  - 'packages/*'
  - 'apps/*'

For monorepos, pnpm is the most popular choice in 2026 because of strictness and speed. Yarn Berry workspaces are powerful but the PnP layout breaks some tools.

Which one should we use?

pnpm — best default for new projects. Fast, disk-efficient, strict. The whole frontend ecosystem (Vue, Vite, Astro) uses it.
npm — fine for small projects. Pre-installed everywhere. Zero setup.
yarn — still solid for legacy projects on Yarn 1. Yarn Berry’s PnP is interesting but the migration is real work.

Whichever we pick, stick with one per project and commit the lockfile.

References

Core APIs

Buffer

intermediate buffer binary encoding memory

A Buffer is Node’s representation of raw binary data — a fixed-length sequence of bytes. In simple language, it’s like an array of integers from 0 to 255, but stored outside the V8 JavaScript heap so it can be passed cheaply to C code (file system, sockets, crypto).

Buffers came before JavaScript had Uint8Array. Today Buffer is a subclass of Uint8Array — anywhere a typed array works, a Buffer works too.

Why we need it

JavaScript strings are UTF-16 encoded internally. When we read a file or receive a network packet, the data is just bytes — could be UTF-8, binary image data, anything. We need a type that represents raw bytes without a charset assumption. That’s Buffer.

Memory layout

V8 Heap

JS objects, strings, numbers, arrays

Managed by GC. Slow to copy to C.

Buffer memory (off-heap)

Raw bytes. Allocated via libuv.

Zero-copy hand-off to syscalls.

Creating buffers

Three main ways, each with different semantics:

// 1. Allocate N bytes, zero-filled (safe, slightly slower)
const a = Buffer.alloc(10);
console.log(a); // <Buffer 00 00 00 00 00 00 00 00 00 00>

// 2. Allocate N bytes, uninitialized (FAST but may contain old data!)
const b = Buffer.allocUnsafe(10);
// b might contain anything — use only if you immediately overwrite all of it

// 3. From existing data
const c = Buffer.from("hello", "utf8");
console.log(c); // <Buffer 68 65 6c 6c 6f>

const d = Buffer.from([0xde, 0xad, 0xbe, 0xef]);
const e = Buffer.from("SGVsbG8=", "base64"); // → "Hello"

Never use the deprecated new Buffer(n) — it was a security disaster (allocated uninitialized memory by default).

Encodings

When converting between string and bytes, we specify an encoding:

utf8 (default) — variable-width, the standard
utf16le — UTF-16 little-endian
ascii — 7-bit ASCII, top bit dropped
latin1 — 1 byte = 1 codepoint, lossy for non-Latin chars
base64, base64url — common for transport / URLs
hex — pairs of hex digits
binary — alias for latin1 (legacy)

const buf = Buffer.from("hello", "utf8");

buf.toString("utf8");    // "hello"
buf.toString("hex");     // "68656c6c6f"
buf.toString("base64");  // "aGVsbG8="

Common operations

const buf = Buffer.from("hello world");

buf.length;              // 11 (bytes, not characters)
buf[0];                  // 104 (the byte for 'h')
buf.slice(0, 5);         // <Buffer 68 65 6c 6c 6f> — shares memory!
buf.subarray(0, 5);      // same; preferred name
buf.includes("world");   // true
buf.indexOf("world");    // 6
buf.equals(Buffer.from("hello world")); // true

// Concat multiple buffers
const merged = Buffer.concat([buf, Buffer.from("!")]);

`slice` shares memory — careful

buf.subarray() (and the old buf.slice()) returns a view, NOT a copy. Writing to it mutates the original.

const a = Buffer.from("hello");
const b = a.subarray(0, 3);
b[0] = 0x48; // 'H'
console.log(a.toString()); // "Hello"   ← original changed!

If we want a real copy, use Buffer.from(buf).

Reading and writing typed values

Buffers have helpers for parsing binary protocols — reading integers, floats at specific offsets in big or little endian:

const buf = Buffer.alloc(8);
buf.writeUInt32BE(0x12345678, 0);  // write 4 bytes big-endian at offset 0
buf.writeUInt32LE(0xCAFEBABE, 4);  // little-endian at offset 4

buf.readUInt32BE(0).toString(16);  // "12345678"
buf.readUInt32LE(4).toString(16);  // "cafebabe"

This matters when we’re talking to TCP protocols, parsing image headers, or implementing wire formats.

Real-world: hashing a file

const fs = require("node:fs");
const crypto = require("node:crypto");

const hash = crypto.createHash("sha256");
const stream = fs.createReadStream("./big-file.zip");

stream.on("data", (chunk) => {
  // chunk is a Buffer
  hash.update(chunk);
});

stream.on("end", () => {
  console.log(hash.digest("hex"));
});

Notice we never convert chunks to strings — that would corrupt binary data. The whole pipeline is buffer → buffer.

Buffer pool — a perf detail

For small buffers (< 4KB by default), Buffer.allocUnsafe and Buffer.from(string) allocate from a shared pool to avoid the cost of asking libuv for memory each time. That’s why “unsafe” buffers may contain bits of previously freed data. For larger sizes, Node allocates fresh memory directly.

When to use Buffer vs Uint8Array

In new code, Uint8Array works in browsers AND Node. Buffer adds convenience methods (toString, write, indexOf for strings, encoding conversions) but is Node-only. For shared browser/Node code, prefer Uint8Array + TextEncoder/TextDecoder for string conversion.

References

Buffer - Node.js Docs

Streams & Backpressure

intermediate streams backpressure pipe pipeline

Streams are how Node handles data we can’t (or don’t want to) hold all in memory at once. A 50GB log file, an HTTP request body, a video upload — we process it in chunks as it flows. In simple language, a stream is an iterable that emits pieces over time.

The four stream types

Readable

We read FROM it. Examples: fs.createReadStream, an HTTP request body, process.stdin.

Writable

We write TO it. Examples: fs.createWriteStream, an HTTP response, process.stdout.

Duplex

Both readable and writable, independent. Example: a TCP socket.

Transform

Duplex where output is computed from input. Examples: zlib.createGzip(), crypto.createCipher().

Reading a file with streams

The classic example. Reading a 10GB file with fs.readFile would blow up our memory. With streams, we process it 64KB at a time:

const fs = require("node:fs");

const stream = fs.createReadStream("./huge.log", { encoding: "utf8" });

stream.on("data", (chunk) => {
  console.log(`got ${chunk.length} bytes`);
});

stream.on("end", () => console.log("done"));
stream.on("error", (err) => console.error(err));

The internal buffer (the highWaterMark, default 64KB for byte streams) fills up, emits 'data', drains, fills again. Memory stays bounded no matter how big the file is.

Piping — connecting streams

Most of the time we don’t want to handle chunks manually. We chain streams with .pipe():

const fs = require("node:fs");
const zlib = require("node:zlib");

// Read → gzip → write — entire pipeline streamed
fs.createReadStream("./access.log")
  .pipe(zlib.createGzip())
  .pipe(fs.createWriteStream("./access.log.gz"));

Three streams, zero buffering of the whole file. Each chunk flows through the chain.

Backpressure — the most important concept

Backpressure is what makes streams safe. In simple language — when the downstream is slow, the upstream needs to pause until the downstream catches up. Otherwise the slow side’s internal buffer grows without bound and we run out of memory.

Backpressure in action

Readable

fast disk

→ chunks →

Transform

gzip (slow!)

→

Writable

network

If gzip's buffer fills → write() returns false → readable PAUSES until 'drain' event.

When we call writable.write(chunk), it returns a boolean:

true — buffer has room, keep writing.
false — buffer is full, wait for the 'drain' event before writing more.

pipe() handles all this for us automatically. If we write streams manually, we have to respect that return value.

function pumpManually(readable, writable) {
  readable.on("data", (chunk) => {
    const ok = writable.write(chunk);
    if (!ok) {
      readable.pause(); // STOP reading
      writable.once("drain", () => readable.resume()); // resume when ready
    }
  });
}

pipeline() — the modern, safe way

pipe() has a famous flaw — if any stream in the middle errors out, the others don’t get destroyed and we leak. stream.pipeline() fixes that with proper error and cleanup handling:

const { pipeline } = require("node:stream/promises");
const fs = require("node:fs");
const zlib = require("node:zlib");

async function gzipFile(input, output) {
  await pipeline(
    fs.createReadStream(input),
    zlib.createGzip(),
    fs.createWriteStream(output)
  );
  console.log("done");
}

gzipFile("./access.log", "./access.log.gz").catch(console.error);

Always prefer pipeline over pipe for production code.

Async iteration

Modern Node lets us treat streams as async iterables — much cleaner than event listeners:

const fs = require("node:fs");

async function countLines(path) {
  const stream = fs.createReadStream(path, { encoding: "utf8" });
  let count = 0;

  for await (const chunk of stream) {
    count += (chunk.match(/\n/g) || []).length;
  }

  return count;
}

Object mode

By default streams move Buffers or strings. Set { objectMode: true } and we can pass arbitrary JS objects — useful for record-by-record processing pipelines (CSV rows, JSON lines, DB rows).

const { Transform } = require("node:stream");

const toUpper = new Transform({
  objectMode: true,
  transform(record, _enc, cb) {
    cb(null, { ...record, name: record.name.toUpperCase() });
  },
});

Common real-world uses

HTTP servers — req is a Readable, res is a Writable. Streaming a big response means streaming directly from a file or DB.
File uploads — pipe req through a parser, straight to S3 or disk.
Log processing — read a multi-GB log line by line with readline.
Data ETL — read DB rows as a stream, transform, write to another store.

Quick rules

Use pipeline() for any non-trivial chain.
Respect backpressure if you write streams manually.
Don’t JSON.stringify a 1GB object then write it — stream it.
For line-by-line text, use readline.createInterface({ input: stream }).

References

File System

beginner nodejs fs io promises

The fs module is how we touch the disk from Node — read files, write files, list directories, watch for changes. It’s one of the first modules everyone uses, and getting the async/sync distinction right is important because Node runs on a single thread.

In simple language: when we read a file synchronously, the entire Node process stops until the file is read. That’s fine for a tiny config at startup, but disastrous inside a request handler — every other request waits.

The three flavors

Node gives us the same operations in three styles. They all do the same thing, just with different async patterns.

fs (callbacks)

Original API. Error-first callback. Old-school.

fs.readFile(path, cb)

fs/promises

Modern. Works with async/await. Use this.

await fs.readFile(path)

fs.*Sync

Blocks the event loop. Only at startup.

fs.readFileSync(path)

Reading and writing — the modern way

We almost always reach for fs/promises. Here’s the pattern we use 90% of the time.

import { readFile, writeFile } from 'node:fs/promises';

// read JSON config
const raw = await readFile('./config.json', 'utf8');
const config = JSON.parse(raw);

// write JSON back
await writeFile('./config.json', JSON.stringify(config, null, 2));

Notice the 'utf8' — without it, readFile returns a Buffer (raw bytes). Forgetting this is the #1 fs gotcha.

When sync is actually okay

There’s exactly one place sync APIs are fine: at startup, before the server is accepting traffic. Loading a config, checking if a directory exists — fine.

import { existsSync, mkdirSync } from 'node:fs';

if (!existsSync('./logs')) {
  mkdirSync('./logs', { recursive: true });
}

Inside a request handler? Never. We block every other in-flight request.

Appending to a log file

A super common real-world pattern. appendFile creates the file if it doesn’t exist.

import { appendFile } from 'node:fs/promises';

async function logEvent(event) {
  const line = `${new Date().toISOString()} ${JSON.stringify(event)}\n`;
  await appendFile('./logs/app.log', line);
}

For high-volume logging we’d use a write stream instead — opening/closing the file on every line is slow.

Watching files

fs.watch notifies us when a file or directory changes. Great for dev tools, config hot-reload, etc. The only catch: it’s a bit unreliable across platforms (macOS uses FSEvents, Linux uses inotify, Windows is its own beast). For production-grade watching, libraries like chokidar smooth out the differences.

import { watch } from 'node:fs';

watch('./config.json', (eventType, filename) => {
  console.log(`${filename} changed (${eventType})`);
  // reload config here
});

Reading large files — use streams

readFile loads the entire file into memory. For a 10GB log file? RIP. Stream it instead.

import { createReadStream } from 'node:fs';
import { createInterface } from 'node:readline';

const rl = createInterface({
  input: createReadStream('./huge.log'),
  crlfDelay: Infinity,
});

for await (const line of rl) {
  if (line.includes('ERROR')) console.log(line);
}

Streams process the file chunk by chunk — constant memory, regardless of file size.

The mental model

Pick fs/promises by default. Use *Sync only for startup config. Reach for streams when files get big. Don’t read inside a hot path if you can cache the result. That covers maybe 95% of all real-world fs usage.

References

Path & URL

beginner nodejs path url esm

Concatenating paths with '/' works on macOS and Linux. It breaks on Windows. That’s why the path module exists — it gives us platform-agnostic path operations so our code runs the same everywhere.

In simple language: never write dir + '/' + file. Always use path.join(dir, file). The module figures out the right separator for the OS we’re running on.

path.join vs path.resolve

These two trip everyone up. The difference matters.

path.join — just glues segments together with the OS separator. Relative stays relative.
path.resolve — produces an absolute path, walking from right to left until it hits an absolute segment (or falling back to the current working directory).

import path from 'node:path';

path.join('foo', 'bar', 'baz.txt');
// 'foo/bar/baz.txt'  (still relative)

path.resolve('foo', 'bar', 'baz.txt');
// '/Users/manish/proj/foo/bar/baz.txt'  (absolute, from cwd)

path.resolve('/etc', 'config', '../app.conf');
// '/etc/app.conf'  (absolute segment wins, .. collapsed)

Rule of thumb: use resolve when we need an absolute path (passing to fs, comparing paths). Use join for building a relative subpath.

The useful helpers

path.dirname('/var/log/app.log');   // '/var/log'
path.basename('/var/log/app.log');  // 'app.log'
path.extname('/var/log/app.log');   // '.log'
path.parse('/var/log/app.log');
// { root: '/', dir: '/var/log', base: 'app.log', name: 'app', ext: '.log' }

path.parse is great when we need multiple pieces at once.

The ESM __dirname problem

CommonJS had __dirname and __filename baked in as globals. ESM ("type": "module" in package.json) doesn’t. When we switch to ESM, those globals disappear and code breaks.

In simple language: ESM modules don’t know their own location for free anymore — we have to compute it from import.meta.url, which is a file:// URL.

CommonJS

__dirname
__filename

Just works.

ESM

import.meta.url
+ fileURLToPath

Manual conversion.

The workaround:

import { fileURLToPath } from 'node:url';
import path from 'node:path';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

// now use as before
const config = path.join(__dirname, 'config.json');

Why fileURLToPath? Because import.meta.url is a string like file:///Users/manish/app/index.js — we have to convert that URL into a regular filesystem path before passing to fs.

Newer Node (20.11+) actually gives us import.meta.dirname and import.meta.filename directly, which skips the dance. Use them when our Node version allows.

URL parsing

Node uses the WHATWG URL standard (same as browsers). Forget the old url.parse — it’s deprecated.

const u = new URL('https://api.example.com/v1/users?id=42&active=true');

u.hostname;        // 'api.example.com'
u.pathname;        // '/v1/users'
u.searchParams.get('id');     // '42'
u.searchParams.get('active'); // 'true'

searchParams is a URLSearchParams object — iterable, supports append, delete, set. Way nicer than parsing query strings by hand.

Building URLs is just as clean:

const u = new URL('https://api.example.com');
u.pathname = '/v1/users';
u.searchParams.set('id', '42');
u.toString(); // 'https://api.example.com/v1/users?id=42'

The mental model

Use path.join for relative paths, path.resolve for absolute. Convert import.meta.url whenever you need __dirname in ESM. Parse URLs with new URL(), never string-splitting. These three habits cover most real-world cases without ever shipping a Windows-broken bug.

References

Process & Env Vars

beginner nodejs process env dotenv

process is a global object in Node — no import needed. It’s the bridge between our JavaScript code and the operating system: command-line arguments, environment variables, exit codes, signals, the current working directory. Every real Node app uses it constantly.

In simple language: process is “how Node sees the outside world.” Anything that came from the shell that ran us — args, env vars, stdin — lives here.

process.argv — command-line arguments

When we run node script.js --port 3000 --debug, the args show up here. The catch: the first two entries are always the Node binary path and the script path.

console.log(process.argv);
// [ '/usr/bin/node', '/app/script.js', '--port', '3000', '--debug' ]

const args = process.argv.slice(2);
// [ '--port', '3000', '--debug' ]

For anything beyond trivial parsing, reach for the built-in node:util.parseArgs (Node 18.3+) or libraries like commander / yargs.

import { parseArgs } from 'node:util';

const { values } = parseArgs({
  options: {
    port: { type: 'string', default: '3000' },
    debug: { type: 'boolean', default: false },
  },
});
// values.port === '3000', values.debug === false

process.env — environment variables

Every env var is a string. process.env.PORT is "3000", not the number 3000. Convert deliberately.

const port = parseInt(process.env.PORT ?? '3000', 10);
const debug = process.env.DEBUG === 'true';

The ?? handles the unset case — process.env.SOMETHING_UNSET is undefined.

The dotenv pattern

We don’t want to type PORT=3000 DATABASE_URL=... node app.js every time. The convention: keep secrets in a .env file (gitignored) and load it at startup.

# .env
PORT=3000
DATABASE_URL=postgres://localhost/mydb
ANTHROPIC_API_KEY=sk-ant-...

Old-school way — the dotenv package:

import 'dotenv/config';
// now process.env.PORT, process.env.DATABASE_URL etc. are populated

Node 20.6+ ships this natively. No dependency needed:

node --env-file=.env app.js

process.exit and exit codes

process.exit(0) says “success,” anything non-zero is failure. Shell scripts and CI pipelines check these codes.

if (!process.env.DATABASE_URL) {
  console.error('FATAL: DATABASE_URL required');
  process.exit(1);
}

The gotcha: process.exit is abrupt. Pending writes to stdout/stderr can get cut off. For graceful shutdown, set process.exitCode = 1 and let the event loop drain naturally.

process events — graceful shutdown

When Kubernetes sends SIGTERM or we hit Ctrl+C (SIGINT), we should close DB connections, finish in-flight requests, then exit. The pattern:

async function shutdown(signal) {
  console.log(`Received ${signal}, shutting down...`);
  await server.close();
  await db.end();
  process.exit(0);
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));

The two events nobody should ignore:

uncaughtException — a thrown error nothing caught. State is unknown, log it and exit.
unhandledRejection — a promise rejected with no .catch. Same deal.

process.on('uncaughtException', (err) => {
  console.error('Uncaught:', err);
  process.exit(1);
});

process.on('unhandledRejection', (reason) => {
  console.error('Unhandled rejection:', reason);
  process.exit(1);
});

Trying to “recover” from these is almost always wrong — the app is in an undefined state.

The other useful bits

process.cwd();          // current working directory
process.chdir('/tmp');  // change it (rarely needed)
process.pid;            // process ID — useful in logs
process.platform;       // 'darwin' | 'linux' | 'win32'
process.version;        // 'v20.10.0' — Node version
process.uptime();       // seconds since process started
process.memoryUsage();  // { rss, heapTotal, heapUsed, ... }

process.memoryUsage() is gold for debugging memory leaks. Log heapUsed periodically and watch the trend.

The mental model

process is the OS-facing side of Node. Parse argv for CLI args, read env for config (always as strings), exit cleanly with proper codes, and always handle SIGTERM if you ship anything to production — otherwise rolling deploys will drop in-flight requests.

References

Process - Node.js Docs

Async Patterns

Callbacks, Promises & async/await in Node

intermediate nodejs async promises esm

Node started before promises existed in JavaScript, so its original async style was callbacks — and not just any callbacks, a specific convention called error-first callbacks. Everything we do today (promises, async/await) is layered on top of that foundation. Understanding the progression helps when we debug legacy code or interop with older modules.

Error-first callbacks — the original

The convention: every async function takes a callback whose first argument is an error (or null on success), and subsequent arguments are the actual results.

import fs from 'node:fs';

fs.readFile('./config.json', 'utf8', (err, data) => {
  if (err) {
    console.error('Read failed:', err);
    return;
  }
  console.log('Got:', data);
});

The “first param is the error” convention sounds simple, but in a real app with five nested async calls we end up with callback hell — pyramids of indentation, error handling repeated everywhere, no way to use try/catch.

fs.readFile('config.json', 'utf8', (err, data) => {
  if (err) return cb(err);
  fs.readFile(JSON.parse(data).next, 'utf8', (err, data2) => {
    if (err) return cb(err);
    fs.writeFile('out.txt', data2, (err) => {
      if (err) return cb(err);
      // ...
    });
  });
});

Promises — chainable, composable

Promises wrap a future value. We attach .then for success, .catch for failure. The chain flattens the pyramid.

import { readFile, writeFile } from 'node:fs/promises';

readFile('config.json', 'utf8')
  .then((data) => readFile(JSON.parse(data).next, 'utf8'))
  .then((data2) => writeFile('out.txt', data2))
  .catch((err) => console.error('Failed:', err));

Better, but still verbose. The real win comes next.

async/await — promises in disguise

async/await is syntactic sugar over promises. An async function always returns a promise. await pauses inside that function until the awaited promise resolves. We get to write async code that reads like sync code.

async function transform() {
  try {
    const data = await readFile('config.json', 'utf8');
    const data2 = await readFile(JSON.parse(data).next, 'utf8');
    await writeFile('out.txt', data2);
  } catch (err) {
    console.error('Failed:', err);
  }
}

In simple language: await is a “wait for this, then continue on the next line” marker. The function returns control to the event loop while waiting — Node isn’t blocked.

Callbacks

Original Node style. Error-first. Hard to compose.

Promises

Chainable. then/catch. Composable.

async/await

Reads like sync. try/catch works. Default in 2025.

Sequential vs parallel — the await trap

await runs things one at a time. If three operations don’t depend on each other, that’s wasteful.

// SLOW — 3 sequential round-trips
const user = await fetchUser(id);
const orders = await fetchOrders(id);
const cart = await fetchCart(id);

// FAST — all 3 in parallel, wait for the slowest
const [user, orders, cart] = await Promise.all([
  fetchUser(id),
  fetchOrders(id),
  fetchCart(id),
]);

This is one of the most common perf wins in Node code. Look for “await, await, await” with no data dependency and combine with Promise.all.

Promisification — bridging old code

Some old modules still use error-first callbacks. We don’t want to write .then chains around them. Wrap them with util.promisify.

import { promisify } from 'node:util';
import { exec as execCb } from 'node:child_process';

const exec = promisify(execCb);

const { stdout } = await exec('git rev-parse HEAD');
console.log('Commit:', stdout.trim());

fs.promises is essentially fs callbacks promisified at the source — same API, promise-based.

Top-level await — only in ESM

Old Node modules (CJS) couldn’t await at the top of a file — only inside an async function. ESM modules can. This is huge for startup code.

// app.js — package.json has "type": "module"
import { readFile } from 'node:fs/promises';

const config = JSON.parse(await readFile('./config.json', 'utf8'));
const db = await connectDB(config.dbUrl);

export { db };

No more (async () => { ... })() IIFE wrappers around our entry point. Just write the code.

The catch: top-level await makes a module’s evaluation async. If something imports this module, its import statement effectively waits for us. Usually fine, occasionally surprising.

The mental model

Use async/await by default. Use Promise.all for independent parallel work. Wrap legacy callback APIs with promisify. In ESM, lean on top-level await for startup. Callbacks aren’t dead — many event APIs (EventEmitter, streams) still use them — but for one-shot async results, promises and await won.

References

util.promisify

intermediate nodejs util promises async

A huge chunk of Node’s core was designed before promises existed in JavaScript. Lots of APIs — fs, child_process, dns, plenty of npm packages — still take an error-first callback as their last argument. We don’t want to keep nesting callbacks in 2025. util.promisify is the official adapter that converts any such function into one that returns a promise.

In simple language: it takes a function that wants a callback and gives back a function that returns a promise. Zero ceremony, works on almost anything.

The convention it relies on

promisify assumes the function follows the error-first callback rule:

callback is the last argument
callback’s signature is (err, value) => ...

If both are true, promisify works automatically. Here’s the manual version of what it does, just to demystify it:

function manualPromisify(fn) {
  return function (...args) {
    return new Promise((resolve, reject) => {
      fn(...args, (err, result) => {
        if (err) reject(err);
        else resolve(result);
      });
    });
  };
}

That’s basically it. The real util.promisify is more robust (handles this, multi-arg callbacks, special-cased core funcs) but the spirit is identical.

Using it

import { promisify } from 'node:util';
import { exec } from 'node:child_process';
import dns from 'node:dns';

const execAsync = promisify(exec);
const lookup = promisify(dns.lookup);

const { stdout } = await execAsync('git rev-parse HEAD');
console.log('HEAD:', stdout.trim());

const { address } = await lookup('nodejs.org');
console.log('IP:', address);

We now await what used to need a callback. Errors flow through normal try/catch.

What fs.promises actually is

fs/promises is what we’d get if we sat down and promisified every function in fs. The Node team did that work for us and shipped it as a separate module.

fs.readFile(path, cb) — promisify → fs.promises.readFile(path)

Same logic, same options. The callback became a returned promise.

Proof — we could literally rebuild it:

import fs from 'node:fs';
import { promisify } from 'node:util';

const readFile = promisify(fs.readFile);
const writeFile = promisify(fs.writeFile);

// these now behave just like fs.promises.readFile / writeFile
const data = await readFile('./config.json', 'utf8');

fs.promises is just nicer ergonomics with a single import.

Custom promisify behavior

Some core APIs don’t follow the strict (err, value) shape — for example dns.lookup calls back with (err, address, family), two result args. Node special-cases these via the util.promisify.custom symbol. The promisified version returns { address, family } instead of just address.

import { promisify } from 'node:util';
import dns from 'node:dns';

const lookup = promisify(dns.lookup);
const result = await lookup('nodejs.org');
// { address: '104.20.22.46', family: 4 }

We don’t normally need to set [util.promisify.custom] ourselves, but if we ship a library with non-standard callback shapes, that’s how we’d do it.

When NOT to use promisify

If the function has any of these traits, promisify is wrong:

Emits events repeatedly (e.g., a stream emitting data multiple times). Promises resolve once. Use for await...of, pipeline, or stay on events.
The callback isn’t error-first (e.g., setTimeout(cb, ms) — its callback has no err). You can still wrap it manually, just not with promisify.
Already returns a promise. No-op at best, weird wrapping at worst.

Here’s the right way to “promisify” setTimeout — Node ships a promise version already:

import { setTimeout as sleep } from 'node:timers/promises';

await sleep(1000); // pauses for 1s

The mental model

util.promisify is the bridge between Node’s callback past and its promise present. We use it directly when we hit an old API that hasn’t been modernized, and we use the already-promisified versions (fs/promises, timers/promises, dns/promises, stream/promises) whenever they exist — they’re idiomatic and well-tested.

References

util.promisify - Node.js Docs

EventEmitter

intermediate nodejs events pubsub

EventEmitter is the publish-subscribe primitive that sits underneath an enormous fraction of Node’s core: every stream is an emitter, every HTTP server and request is an emitter, the process global is one too. If we want to understand what req.on('data', ...) really does, we have to understand EventEmitter.

In simple language: it’s an object with two main methods — emit('name', data) to fire an event, and on('name', handler) to subscribe to it. That’s the whole concept. Everything else is variation.

The basic pattern

import { EventEmitter } from 'node:events';

const bus = new EventEmitter();

bus.on('user.signup', (user) => {
  console.log(`Welcome email queued for ${user.email}`);
});

bus.on('user.signup', (user) => {
  console.log(`Analytics tracked for ${user.id}`);
});

bus.emit('user.signup', { id: 42, email: 'a@b.com' });

Multiple listeners on the same event? They all run, in registration order, synchronously when emit is called. The emitter doesn’t await anything — if a listener is async, it runs but emit doesn’t wait for it.

Publisher
emit('x', data)

→ → →

EventEmitter
listener map

→ → →

Listener A

Listener B

Listener C

once — fire-and-forget subscriber

If we only care about the first occurrence, once auto-removes the listener after it fires.

server.once('listening', () => {
  console.log('Server started');
});

Great for one-time initialization signals.

off / removeListener — cleanup

If we add listeners dynamically, we have to remove them or we leak memory. off (alias for removeListener) needs the same function reference we passed to on.

function onData(chunk) { /* ... */ }

stream.on('data', onData);
// later
stream.off('data', onData);

Anonymous arrow functions can’t be removed cleanly because we don’t have a reference. That’s why long-lived emitters always store handler references.

The MaxListeners warning

Every emitter has a soft limit — by default 10 listeners per event. Cross that and Node prints:

(node:1234) MaxListenersExceededWarning: Possible EventEmitter memory leak detected.
11 data listeners added to [ReadStream].

In simple language: Node is saying “you keep adding listeners and never removing them — looks like a leak.” Sometimes it’s a real bug (forgot to off), sometimes it’s just a high-traffic legitimate use. We can raise the cap:

emitter.setMaxListeners(50);          // per-instance
EventEmitter.defaultMaxListeners = 20; // global default

But always check whether the listeners should actually be removed before papering over the warning.

The special ‘error’ event

EventEmitter has one cursed event name: 'error'. If we emit('error', err) and nothing is listening, Node treats it as uncaught and crashes the process.

const e = new EventEmitter();
e.emit('error', new Error('boom')); // CRASHES

The fix is always to have an error listener:

e.on('error', (err) => {
  console.error('Emitter error:', err);
});

This is why streams everywhere need .on('error', ...) — it’s the same EventEmitter behavior.

Extending it for our own classes

The natural way to build a class with built-in pub/sub:

import { EventEmitter } from 'node:events';

class JobRunner extends EventEmitter {
  async run(job) {
    this.emit('start', job);
    try {
      const result = await job.execute();
      this.emit('done', { job, result });
    } catch (err) {
      this.emit('error', err);
    }
  }
}

const runner = new JobRunner();
runner.on('done', ({ job, result }) => log(`${job.id} → ${result}`));
runner.on('error', (err) => alert(err));

This pattern shows up everywhere — Express’s app, Mongoose connection, ws WebSocket server, the Node process itself.

events.once — promise wrapper

When we want to await for a single event (e.g., wait for 'listening'), there’s a helper:

import { once } from 'node:events';

await once(server, 'listening');
console.log('Server is up');

Resolves with an array of args. Rejects if 'error' fires first. Beautiful for sequencing.

The mental model

EventEmitter is sync pub/sub: emit is just “loop through listeners and call them in order.” Always handle 'error'. Always remove listeners on long-lived emitters. When you npm install something and it has an .on(...) API, you’re almost certainly looking at an EventEmitter underneath.

References

Events - Node.js Docs

HTTP & Networking

http module

intermediate nodejs http server networking

Express, Fastify, Koa — they’re all wrappers around this. Node ships with everything needed to build an HTTP server and client out of the box. Understanding the raw http module is what separates “I use a framework” from “I know what my framework actually does.”

In simple language: http.createServer gives us a callback (req, res) => {...} that fires for every incoming request. req is a readable stream (the request), res is a writable stream (the response we send back). That’s the whole API.

A minimal server

import http from 'node:http';

const server = http.createServer((req, res) => {
  res.writeHead(200, { 'Content-Type': 'application/json' });
  res.end(JSON.stringify({ hello: 'world' }));
});

server.listen(3000, () => {
  console.log('Listening on http://localhost:3000');
});

No dependencies. No framework. Real production HTTP server.

The req / res lifecycle

Client

sends request

→

req (IncomingMessage)

method, url, headers, body stream

→

handler runs

→

res.writeHead → res.write → res.end

→

Client receives

IncomingMessage — the request

req is a readable stream. The body doesn’t arrive in one chunk — we have to assemble it.

function readBody(req) {
  return new Promise((resolve, reject) => {
    const chunks = [];
    req.on('data', (chunk) => chunks.push(chunk));
    req.on('end', () => resolve(Buffer.concat(chunks).toString('utf8')));
    req.on('error', reject);
  });
}

const server = http.createServer(async (req, res) => {
  if (req.method === 'POST' && req.url === '/echo') {
    const body = await readBody(req);
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end(body);
  } else {
    res.writeHead(404);
    res.end();
  }
});

This is exactly what Express’s body-parser does for us, just hidden behind req.body.

ServerResponse — sending back

Three layers of writing:

res.writeHead(statusCode, headers) — sends the status line + headers. Call once.
res.write(chunk) — sends a body chunk. Call zero or more times (streaming).
res.end([chunk]) — finishes the response. Required, else the client hangs forever.

We can stream a big response without buffering:

import fs from 'node:fs';

http.createServer((req, res) => {
  res.writeHead(200, { 'Content-Type': 'video/mp4' });
  fs.createReadStream('./big.mp4').pipe(res);
}).listen(3000);

pipe connects the file stream to the response stream — chunks flow through, memory stays flat.

Raw http vs Express — what does Express add?

Almost everything in Express is sugar over what we just wrote:

What Express adds	Underlying http
Routing (`app.get('/users/:id', ...)`)	Manual `req.url` + `req.method` checks
`req.body`, `req.params`, `req.query`	Manual stream reading and URL parsing
Middleware chain	One handler function
`res.json()`, `res.send()`	`writeHead` + `end`
Error handling middleware	Try/catch + sending error responses

Express isn’t magic — it’s a thoughtful set of patterns on top of http.createServer. Knowing this means we can drop down to raw http for performance-critical endpoints, or build our own framework in a weekend.

http.request — the client side

Same module, opposite direction. We can make outgoing HTTP requests too.

import http from 'node:http';

const req = http.request({
  hostname: 'api.example.com',
  path: '/users/42',
  method: 'GET',
}, (res) => {
  const chunks = [];
  res.on('data', (c) => chunks.push(c));
  res.on('end', () => {
    console.log('Got:', Buffer.concat(chunks).toString('utf8'));
  });
});

req.on('error', console.error);
req.end(); // sends the request

In practice, we use the built-in fetch (Node 18+) for this — it’s promise-based and matches the browser API. But http.request is what powers libraries like axios and is still the most efficient option for streaming or fine-grained control over keep-alive and agents.

The keep-alive gotcha

By default, Node creates a new TCP connection for every outgoing request. For high-volume calls (one service calling another thousands of times a minute), this is brutal. We use an Agent with keepAlive: true to reuse connections:

import { Agent } from 'node:http';

const agent = new Agent({ keepAlive: true, maxSockets: 50 });
// pass agent into http.request options

Modern fetch and clients like undici do this automatically.

The mental model

http.createServer takes (req, res). req is a stream we read. res is a stream we write. Everything else — routing, middleware, JSON parsing — is a pattern built on top. Once that clicks, no Node HTTP code is mysterious anymore.

References

HTTP - Node.js Docs

HTTPS & TLS

intermediate nodejs https tls security

https is http plus TLS — same API, same req/res shape, but the bytes on the wire are encrypted. In real production we almost never expose Node’s HTTPS server directly; a reverse proxy (Caddy, Nginx, ALB) handles TLS and forwards plain HTTP to our app. But knowing the raw module matters when we build internal mTLS services, talk to a third-party API with a custom cert, or troubleshoot why our fetch says “self-signed certificate.”

In simple language: TLS is the encryption layer between TCP and HTTP. We give Node a private key + certificate, it does the handshake with clients, and our handler code sees a normal request.

A minimal HTTPS server

import https from 'node:https';
import { readFileSync } from 'node:fs';

const server = https.createServer({
  key: readFileSync('./certs/server.key'),
  cert: readFileSync('./certs/server.crt'),
}, (req, res) => {
  res.writeHead(200);
  res.end('Hello over TLS');
});

server.listen(8443);

Same (req, res) handler as plain HTTP. The only difference is the options object with key and cert.

Where the cert comes from

For local dev we generate a self-signed cert with mkcert (handles the trust store dance):

mkcert -install
mkcert localhost 127.0.0.1
# produces localhost.pem and localhost-key.pem

For production we get certs from Let’s Encrypt (via certbot or Caddy), or from a cloud-managed cert service. Don’t ship self-signed certs to production — clients will refuse the connection unless explicitly told to ignore.

The TLS handshake — what’s happening

Client Hello (supported ciphers)

→

←

Server Hello + Certificate

verify cert against trusted CA

Key exchange

↔

Key exchange

✓ Encrypted channel established

✓

HTTP traffic flows

The client validates that our cert is signed by a CA it trusts and that the hostname matches. That’s where most TLS pain comes from.

Mutual TLS (mTLS) — the client proves who it is too

In normal HTTPS, only the server presents a cert. In mutual TLS, the client also presents a cert, and the server validates it. This is how zero-trust internal services authenticate without API keys — Kubernetes service meshes, AWS IAM Roles Anywhere, Stripe’s payment terminal API.

import https from 'node:https';
import { readFileSync } from 'node:fs';

const server = https.createServer({
  key: readFileSync('./certs/server.key'),
  cert: readFileSync('./certs/server.crt'),
  ca: readFileSync('./certs/client-ca.crt'),  // CA we trust to sign client certs
  requestCert: true,    // ask client for a cert
  rejectUnauthorized: true, // close connection if client cert is invalid
}, (req, res) => {
  const cert = req.socket.getPeerCertificate();
  res.end(`Hello, ${cert.subject.CN}`);
});

server.listen(8443);

The client must present a cert signed by client-ca.crt. The server then knows exactly who’s calling.

Calling an mTLS server as a client

Same idea, other direction:

import https from 'node:https';

const req = https.request({
  hostname: 'internal-api.local',
  port: 8443,
  path: '/data',
  method: 'GET',
  key: readFileSync('./certs/client.key'),
  cert: readFileSync('./certs/client.crt'),
  ca: readFileSync('./certs/server-ca.crt'), // CA that signed the server's cert
}, (res) => {
  res.pipe(process.stdout);
});
req.end();

Common pitfalls

UNABLE_TO_VERIFY_LEAF_SIGNATURE — the server’s cert chain is incomplete. The fix is on the server side: include intermediate certs in the chain, not just the leaf. We can also point Node at extra CAs:

NODE_EXTRA_CA_CERTS=/path/to/corporate-root.pem node app.js

SELF_SIGNED_CERT_IN_CHAIN in dev — we’re calling our own self-signed server. We tell our HTTP client to trust it via ca: option. Do not set NODE_TLS_REJECT_UNAUTHORIZED=0 in production — it disables all cert checking globally and is a giant security hole.

Hostname mismatch — the cert is for api.example.com but we’re connecting to 1.2.3.4. TLS verifies the hostname. Either use the domain name, or configure the cert with a Subject Alternative Name for the IP.

Cert expiry — Let’s Encrypt certs last 90 days. If we forget to renew, the entire service goes down. Use auto-renewal (Caddy does this for free) and monitor expiry.

When to terminate TLS in Node vs at a proxy

Honestly, most of the time we put a reverse proxy in front of Node — it handles TLS, our app speaks plain HTTP internally. The proxy does cert renewal, HTTP/2, compression, often better than Node would. We reach for Node’s HTTPS when:

We need mTLS at the application layer (auth tied to cert).
We’re building a CLI tool or background worker that talks to a TLS-protected internal service.
We’re writing a webhook receiver for a service that requires TLS to a specific hostname we own.

The mental model

https is http plus a { key, cert } options bag. Cert + private key go on the server. Trusted CAs go on whoever’s verifying. mTLS just means both sides present a cert. When something breaks, 90% of the time it’s a hostname mismatch, missing intermediate cert, or expired cert — not Node’s fault.

References

net & TCP

advanced nodejs net tcp sockets

HTTP is a protocol that runs on top of TCP. TCP is the actual transport layer — a stream of bytes between two machines, with delivery guarantees and ordering, but no concept of “requests” or “responses.” Node’s net module gives us direct access. We rarely need it, but when we do, nothing else will work.

In simple language: net is what http uses underneath. If we strip HTTP away, we’re just reading and writing bytes on a socket. That’s a TCP connection.

A minimal TCP server

import net from 'node:net';

const server = net.createServer((socket) => {
  console.log('Client connected:', socket.remoteAddress);

  socket.write('Welcome to the echo server\n');

  socket.on('data', (chunk) => {
    socket.write(`echo: ${chunk}`);
  });

  socket.on('end', () => {
    console.log('Client disconnected');
  });
});

server.listen(4000, () => console.log('TCP server on :4000'));

Test it from another terminal:

nc localhost 4000
# Welcome to the echo server
> hello
# echo: hello

That’s it. No paths, no methods, no headers. Just bytes in, bytes out.

A TCP client

import net from 'node:net';

const client = net.createConnection({ host: 'localhost', port: 4000 }, () => {
  console.log('Connected');
  client.write('ping\n');
});

client.on('data', (chunk) => {
  console.log('Got:', chunk.toString());
  client.end();
});

The socket is a duplex stream

A socket in Node is both readable ('data' events, for await ... of) and writable (write, end). It’s a duplex stream. Everything we know about streams applies.

Client

write →

TCP

← write

Server socket

Two duplex streams glued together by a TCP connection. No request boundaries.

The framing problem — why HTTP exists

Here’s the catch with raw TCP: there are no message boundaries. If a client calls socket.write('hello') then socket.write('world'), the server might see 'helloworld', 'hel' then 'loworld', or 'helloworld' all at once. TCP coalesces and splits at will.

In simple language: TCP is a pipe, not a stack of envelopes. We need to invent our own framing — like ending every message with \n, or prefixing each message with its length.

// length-prefixed framing
function send(socket, payload) {
  const buf = Buffer.from(payload);
  const len = Buffer.alloc(4);
  len.writeUInt32BE(buf.length, 0);
  socket.write(len);
  socket.write(buf);
}

This is exactly the problem HTTP, MQTT, Redis’s RESP, and PostgreSQL’s wire protocol all solve in their own way. Frameworks like HTTP give us message boundaries for free.

When to use net vs http

Use net only when:

You’re implementing a non-HTTP protocol. Custom binary protocols, game servers, IoT devices that speak Modbus / proprietary protocols, Postgres/Redis-style protocols.
You’re building a proxy or load balancer and need to forward raw bytes.
You need lowest possible overhead. No HTTP parsing, no headers. Real-time financial systems, telemetry pipelines.
You’re tunneling something through SSH or a VPN socket.

Use http (or HTTP frameworks) when:

You’re building anything that looks like a web service.
You want to reuse browser tooling (curl, Postman, fetch).
You want middleware, routing, JSON parsing — basically free.

For 99% of backend work, http is the right answer. net is for the 1% that’s genuinely lower-level.

Unix domain sockets

net can also do IPC over a filesystem path, no TCP involved. Way faster than localhost TCP when two processes on the same machine talk:

const server = net.createServer(handler).listen('/tmp/myapp.sock');
const client = net.createConnection({ path: '/tmp/myapp.sock' });

PostgreSQL, Docker daemon, and many cloud sidecars use this pattern.

Backpressure — same rules as streams

socket.write returns false when the kernel’s send buffer is full. If we ignore that and keep writing, memory balloons. Either await a 'drain' event, or use pipeline to glue streams together — it handles backpressure for us.

import { pipeline } from 'node:stream/promises';

await pipeline(source, socket); // backpressure-safe

The mental model

net gives us a byte pipe between two endpoints. No requests, no responses, no framing. We invent the protocol on top. It’s almost always the wrong choice for web work, and the only sensible choice for custom binary protocols. Knowing it exists — and that http is just bytes-with-rules layered on top — makes the whole networking stack much less mysterious.

References

Net - Node.js Docs

Concurrency & Scaling

Worker Threads

advanced nodejs worker-threads parallelism performance

Node is single-threaded for JavaScript execution. The event loop, our handlers, every line of our code — all on one thread. That’s fine for I/O-bound work (the kernel does the waiting). It’s a disaster for CPU-bound work: a sync 2-second computation blocks every other in-flight request for 2 seconds. Worker Threads are Node’s answer.

In simple language: Worker Threads let us spawn a separate JS thread that runs alongside the main one. Real parallel execution, not just async I/O. We communicate via message passing, like a tiny isolated worker microservice that lives in our process.

What “CPU-bound” actually means

A request is CPU-bound when our process is doing math, not waiting on the network/disk. Examples:

Parsing a 50MB JSON or CSV
Resizing an image
Computing a SHA-256 hash over a big buffer
Compiling a regex against millions of strings
Running ML inference in pure JS

For I/O work (DB query, HTTP fetch, file read), Workers won’t help — Node’s event loop is already great at that.

A minimal worker

Workers live in their own file (or string). We message back and forth.

// main.js
import { Worker } from 'node:worker_threads';

function runHeavy(input) {
  return new Promise((resolve, reject) => {
    const worker = new Worker('./worker.js', { workerData: input });
    worker.on('message', resolve);
    worker.on('error', reject);
    worker.on('exit', (code) => {
      if (code !== 0) reject(new Error(`Worker exited ${code}`));
    });
  });
}

console.log(await runHeavy({ size: 10_000_000 }));

// worker.js
import { workerData, parentPort } from 'node:worker_threads';

// some CPU-heavy task — does NOT block main.js
let sum = 0;
for (let i = 0; i < workerData.size; i++) sum += Math.sqrt(i);

parentPort.postMessage({ sum });

While worker.js is grinding, the main thread keeps serving HTTP requests. That’s the point.

The architecture

Main thread

Event loop, HTTP, fast logic

postMessage →

← on('message')

Worker thread

Own V8 isolate, own event loop, own memory

CPU-heavy work

parentPort.postMessage

Separate memory. Communication via structured-clone message passing.

Each worker is essentially a fresh Node instance running inside the same process. Separate V8 heap, separate event loop, separate require cache.

postMessage — the message channel

postMessage uses the structured clone algorithm to serialize the data — same one browsers use for postMessage between windows. It can move plain objects, Buffers, Maps, Sets, typed arrays, even circular references. It cannot move functions, class instances with methods, or DOM-like objects.

parentPort.postMessage({
  result: bigBuffer,
  meta: { ts: Date.now() },
});

Bigger payload = more cloning cost. If we’re sending megabytes, consider transferList — Node moves the Buffer/ArrayBuffer without copying (the sender loses access to it).

parentPort.postMessage({ buf }, [buf.buffer]); // ownership transfer

SharedArrayBuffer — shared memory between threads

For the rare cases where workers need to read/write the same memory (image processing pipelines, multi-worker numerical compute), SharedArrayBuffer is the escape hatch.

// main.js
const sab = new SharedArrayBuffer(1024);
const view = new Int32Array(sab);
worker.postMessage(sab); // both threads now see the same bytes

Multiple threads writing the same memory is exactly the classic concurrency hazard — race conditions, torn reads, the works. Atomics (built-in) gives us atomic read/write/compare-and-swap. Use sparingly and only when message passing is genuinely too slow.

When to use Workers — and when not

Reach for Workers when:

The CPU work takes more than ~50ms — long enough to noticeably block the event loop.
The work is parallelizable and we want to use multiple cores.
We need real isolation (a sandbox for user-supplied code, for example).

Don’t reach for Workers when:

The work is I/O. Async I/O is already free of the event loop.
The work is tiny. Spawning a worker has startup cost (~10–50ms). For small jobs the overhead dwarfs the gain.
We just want more concurrency for HTTP requests. Use cluster (multiple Node processes behind the OS load balancer), or run multiple containers behind a reverse proxy. That’s the idiomatic Node scaling story.

The worker pool pattern

We almost never spawn a worker per request — startup cost kills us. Instead, we keep a pool of N workers (often os.availableParallelism()), and queue jobs to them. Libraries like piscina do this for us with a pool.run(task) API.

import Piscina from 'piscina';

const pool = new Piscina({ filename: new URL('./worker.js', import.meta.url) });

const result = await pool.run({ image: buf });

Pool stays warm, requests share workers, throughput goes way up.

Workers vs cluster vs child_process — quick contrast

Workers — same process, separate threads, message passing, shared memory possible. CPU-bound JS work.
cluster — multiple Node processes, OS-level load balancing on the same port. Scaling I/O-bound HTTP servers across cores.
child_process — spawning external commands (ffmpeg, git) or running other Node scripts as totally separate processes. Highest isolation, highest overhead.

Pick by what we’re trying to do — they’re not interchangeable.

The mental model

Workers turn Node from single-threaded to multi-threaded for CPU work. The cost is message passing between isolated heaps; the win is unblocking the main event loop. Use a pool, not one-off spawns. And remember: most Node bottlenecks are I/O, not CPU — measure before reaching for this hammer.

References

Worker Threads - Node.js Docs

Cluster Module

advanced nodejs cluster scaling performance

Node.js runs JavaScript on a single thread. So if our server has 8 CPU cores, a plain Node process uses… 1. The other 7 sit idle. That’s wasteful for an HTTP server.

The cluster module fixes this by forking N copies of our process (one per core). All workers share the same port — the OS or the master process load-balances incoming connections across them.

In simple language: cluster is “run my server 8 times in parallel, and let them split traffic.”

Why not just spawn 8 servers manually?

We could run 8 Node processes on ports 3001-3008 and put nginx in front. That works. But cluster is simpler — one entry file, one port, automatic distribution. And workers can talk to the master via IPC if needed.

MASTER (PID 1000)
listens on :3000, forks workers

Worker
PID 1001
CPU 0

Worker
PID 1002
CPU 1

Worker
PID 1003
CPU 2

Worker
PID 1004
CPU 3

All 4 workers accept() on the SAME port :3000

How

The classic pattern: master forks, workers serve.

import cluster from 'node:cluster';
import os from 'node:os';
import http from 'node:http';

const numCPUs = os.cpus().length;

if (cluster.isPrimary) {
  console.log(`Master ${process.pid} forking ${numCPUs} workers`);
  for (let i = 0; i < numCPUs; i++) cluster.fork();

  cluster.on('exit', (worker, code) => {
    console.log(`Worker ${worker.process.pid} died (${code}), respawning`);
    cluster.fork();
  });
} else {
  http.createServer((req, res) => {
    res.end(`Handled by worker ${process.pid}\n`);
  }).listen(3000);
}

Hit :3000 repeatedly and you’ll see different PIDs in the response. That’s the OS round-robining.

Cluster vs Worker Threads — totally different things

People confuse these constantly. They’re not the same.

Aspect	Cluster	Worker Threads
Unit	Separate OS process	Thread inside one process
Memory	Each worker has own V8 heap	Can share memory via `SharedArrayBuffer`
Use case	Scale HTTP servers across cores	Offload CPU-heavy work (image resize, hashing)
Startup cost	Heavy (full process)	Lighter
Comms	IPC messages	`postMessage` + shared buffers

Rule of thumb: cluster = horizontal scaling for I/O-bound web servers. Worker threads = offload one CPU-heavy task without blocking the event loop.

Gotchas

State doesn’t replicate. Each worker has its own memory. In-memory caches, rate limiters, WebSocket connections — none are shared. Use Redis.
Sticky sessions. If we use WebSockets or session affinity, round-robin breaks. Need a layer-7 LB like nginx with ip_hash.
PM2 does this for us. In production, most people use PM2’s cluster mode instead of writing the fork code by hand. Same idea, less boilerplate.
Don’t fork more than os.cpus().length. More workers = more context switching, not more throughput.

When NOT to use cluster

If we’re behind Kubernetes or run multiple Docker containers anyway — skip cluster. One Node process per container, scale by adding containers. Simpler ops story.

References

Child Process

intermediate nodejs child_process shell ipc

Sometimes we need Node to do something Node can’t do directly — run ffmpeg, call git, execute a Python script, shell out to imagemagick. That’s child_process.

It gives us four ways to spawn an external process: spawn, exec, execFile, and fork. They all start a subprocess. The only difference is how output is delivered and what the child is.

spawn vs exec vs fork — pick the right one

Method	Output	Use when
`spawn`	Streamed (stdout/stderr are streams)	Long-running, big output, want to pipe
`exec`	Buffered into one string (callback)	Quick command, small output (< 1MB)
`execFile`	Buffered, no shell	`exec` but safer (no shell injection)
`fork`	IPC channel	Spawning another Node.js script

spawn — the workhorse

spawn returns a child process with stdout/stderr as readable streams. Use this for anything that produces a lot of output or runs a while.

import { spawn } from 'node:child_process';

const ffmpeg = spawn('ffmpeg', ['-i', 'input.mp4', '-c:v', 'libx264', 'out.mp4']);

ffmpeg.stdout.on('data', (chunk) => {
  console.log('stdout:', chunk.toString());
});

ffmpeg.stderr.on('data', (chunk) => {
  // ffmpeg writes progress to stderr, weirdly
  process.stderr.write(chunk);
});

ffmpeg.on('close', (code) => {
  if (code === 0) console.log('done');
  else console.error(`ffmpeg exited with code ${code}`);
});

Because output is streamed, memory stays flat even if ffmpeg runs for an hour and prints megabytes.

exec — convenient but dangerous

exec runs the command through a shell (/bin/sh -c ...) and buffers all output into one string. Easy for one-liners.

import { exec } from 'node:child_process';

exec('git log --oneline -5', (err, stdout, stderr) => {
  if (err) return console.error(err);
  console.log(stdout);
});

The shell convenience comes with a catch: shell injection. Never do this:

// BAD — user controls filename, can inject `; rm -rf /`
exec(`cat ${userInput}`, callback);

Use execFile or spawn with an args array — no shell involved, no injection.

import { execFile } from 'node:child_process';

// Safe. userInput is an argv element, not interpreted by shell.
execFile('cat', [userInput], callback);

Also: exec has a default maxBuffer of 1MB. If the command prints more, it errors. Bump it or switch to spawn.

fork — Node-to-Node with IPC

fork is a special case of spawn for launching another Node script. It sets up an IPC channel so parent and child can send() messages to each other.

// parent.js
import { fork } from 'node:child_process';

const worker = fork('./worker.js');
worker.send({ task: 'resize', file: 'photo.jpg' });
worker.on('message', (msg) => {
  console.log('worker said:', msg);
});

// worker.js
process.on('message', async (msg) => {
  // do heavy work
  const result = await processImage(msg.file);
  process.send({ done: true, result });
});

Use fork when we want to offload CPU work to another process without the complexity of cluster. (Worker threads are usually a better fit for pure-CPU work — fork shines when the child needs its own memory space, e.g. running untrusted code or a separate Node version.)

Production checklist

Always handle error AND close events. A spawn error (binary not found) fires error, not close.
Sanitize args. If user input gets into a child process command, use execFile/spawn with an args array, never string concatenation into a shell.
Set timeouts. Hung children leak. Use the timeout option or kill them manually with child.kill('SIGTERM').
Pipe stdio carefully. By default child stdio is pipe. For fire-and-forget background jobs, use stdio: 'ignore' and detached: true with child.unref() so the parent can exit.
Don’t block the event loop waiting for output. execSync exists. Don’t use it in a request handler.

References

Child Process - Node.js Docs

Debugging & Performance

Debugging with --inspect

intermediate nodejs debugging devtools vscode

Console-log debugging works. Until it doesn’t. When we’re chasing a bug in async code with five awaits and a Promise.all, dropping breakpoints is way faster.

Node has a real debugger built in — same protocol Chrome DevTools uses. We just need to start Node with the right flag.

The two flags

--inspect — opens the debug port (default 127.0.0.1:9229). Code runs immediately.
--inspect-brk — same, but pauses on the very first line, waiting for a debugger to attach.

# Run normally with debugger available
node --inspect server.js

# Pause until DevTools attaches (good for debugging startup code)
node --inspect-brk server.js

In simple language: --inspect is “start running, I’ll attach whenever.” --inspect-brk is “wait for me before doing anything.”

We’ll see this in the terminal:

Debugger listening on ws://127.0.0.1:9229/abc-123
For help, see: https://nodejs.org/en/docs/inspector

Attach with Chrome DevTools

Open Chrome and go to chrome://inspect. Click Configure and make sure localhost:9229 is in the list. Our Node process shows up under “Remote Target” — click inspect.

You get full DevTools: Sources tab for breakpoints, Console for evaluating expressions in the current scope, Memory tab for heap snapshots, Performance tab for CPU profiles.

Attach with VS Code

This is the smoother workflow most of the time. Create .vscode/launch.json:

{
  "version": "0.2.0",
  "configurations": [
    {
      "type": "node",
      "request": "launch",
      "name": "Debug server",
      "program": "${workspaceFolder}/server.js",
      "skipFiles": ["<node_internals>/**"]
    },
    {
      "type": "node",
      "request": "attach",
      "name": "Attach to running",
      "port": 9229
    }
  ]
}

Two modes:

Launch — VS Code starts Node with --inspect-brk itself. Hit F5, done.
Attach — we start Node with --inspect ourselves (e.g. inside Docker), then VS Code connects to port 9229.

Set breakpoints by clicking in the gutter. Hit them by triggering the code path (curl a route, run a script). Use the debug console to evaluate req.body or whatever in the paused scope.

--inspect vs --inspect-brk

--inspect

Code starts running.
Attach anytime.
Debug a live server.

--inspect-brk

Pauses on line 1.
Waits for attach.
Debug startup/init code.

Debugging inside Docker

The inspect port binds to 127.0.0.1 by default — won’t be reachable from outside the container. Bind to 0.0.0.0 and expose the port:

node --inspect=0.0.0.0:9229 server.js

# docker-compose.yml
services:
  app:
    ports:
      - "9229:9229"

Now VS Code attach config with "port": 9229 works against the container.

Warning: never expose 9229 in production. Anyone who can reach that port has remote code execution on our server.

Useful tricks

Conditional breakpoints — right-click a breakpoint, set a condition like userId === 42. Stops only when it matters.
Logpoints — instead of pausing, log a message. Same effect as console.log but without editing code.
debugger statement — drop the keyword debugger; in our code. If a debugger is attached, execution pauses there. If not, no-op.
--inspect with nodemon — nodemon --inspect server.js gives auto-restart + debugger together.

When console.log is still fine

Honestly, for a quick “is this code path even running” question, console.log is faster. The breakpoint workflow shines for:

Inspecting complex object state at a point in time
Stepping through async/await flow
Catching an exception at the throw site (enable “Pause on caught exceptions”)
Debugging a heisenbug we can’t reliably reproduce

References

Profiling & Heap Snapshots

advanced nodejs performance profiling memory

“Our API got slow” or “memory keeps climbing until OOM” are vague. Profiling turns them into “this regex is 60% of CPU time” or “we’re retaining 800k of these objects.”

There are three tools we’ll use: --prof for CPU, heap snapshots for memory, and clinic.js when we want pretty graphs without learning V8 internals.

CPU profiling with —prof

Run Node with --prof and it dumps a V8 tick log to a file like isolate-0xNNNN-v8.log. Then we process it into something readable.

# 1. Run app under load (use autocannon, k6, ab, etc. to generate traffic)
node --prof server.js

# 2. Stop the process, find the log
ls isolate-*-v8.log

# 3. Process into a flat profile
node --prof-process isolate-0x10800000-v8.log > profile.txt

The output looks like:

 [Summary]:
   ticks  total  nonlib   name
   1234   45.2%   60.1%   JavaScript
    412   15.1%   20.0%   C++
    ...

 [JavaScript]:
   ticks  total  nonlib   name
    389   14.2%   18.9%   LazyCompile: *parseRequest /app/server.js:42
    201    7.3%    9.7%   LazyCompile: *hashPassword /app/auth.js:18

In simple language: each “tick” is a sample of “what was the CPU doing right now?” The function with the most ticks is the hot spot.

We’re looking for surprises. “Why is JSON.parse 40% of our time?” or “Why does bcrypt show up — isn’t that supposed to be async?”

CPU profiling via DevTools (nicer)

Run with --inspect, attach Chrome DevTools, go to the Performance tab, hit record, run the load, stop. We get a flame graph with function names, time spent, and we can drill in.

This is usually friendlier than reading --prof-process output. Same data, prettier.

Heap snapshots — for memory issues

A heap snapshot is “freeze the current state of memory, list every object.” We take two snapshots — one before something, one after — and diff them to find what got allocated but never freed.

How to take one:

import { writeHeapSnapshot } from 'node:v8';

// Programmatic
const file = writeHeapSnapshot();
console.log('snapshot saved to', file);

Or via DevTools: attach with --inspect, go to the Memory tab, click Take snapshot.

Memory Leak Hunt

1. Take snapshot A (baseline, app idle)

↓ run suspect workload for 5 min

2. Take snapshot B

↓

3. DevTools: "Comparison" view, sort by Delta. What grew?

↓ click a suspicious class

4. "Retainers" panel shows what's holding the reference

The retainers chain is the magic part. It tells us “this 50MB Map is retained by globalCache in cache.js:12.” Now we know exactly which line to fix.

Clinic.js — easy mode

Writing autocannon scripts and reading flame graphs is fine, but clinic.js packages this nicely.

npm i -g clinic autocannon

# CPU + event loop analysis
clinic doctor -- node server.js
# In another terminal: autocannon -c 100 http://localhost:3000

# CTRL+C the server, browser opens with a report

Three sub-commands worth knowing:

clinic doctor — high-level “is the bottleneck CPU, I/O, event loop, or GC?”
clinic flame — flame graph of CPU hot paths
clinic bubbleprof — async operation timing (shows where awaits stall)

doctor is the right starting point — it tells us which other tool to reach for next.

What to look for

CPU profile — any single function dominating? Often a regex, JSON serialization, sync crypto, or accidentally-quadratic code.
Heap snapshot diff — any class with thousands of instances that should be temporary? Look for Closure, (string), Array with huge retained size.
Event loop lag — clinic.doctor flags it red. Means we’re doing too much sync work between I/O.
GC pressure — if “GC” is a big slice in the CPU profile, we’re allocating too aggressively. Reuse buffers, avoid hot-path .map().filter().reduce() chains.

Production profiling

Don’t run --prof 24/7 — it has overhead. Instead:

Enable --inspect on a non-public port and attach when needed.
Use process.memoryUsage() and perf_hooks to log metrics continuously, profile deeply only when alerts fire.
For really gnarly issues, take a snapshot in prod, download it, analyze locally in DevTools.

References

Memory Leaks

advanced nodejs memory performance leaks

A memory leak in Node is when our process keeps holding on to memory it doesn’t need anymore. RSS climbs. Eventually we hit the heap limit (default ~1.5 GB on 64-bit) and V8 kills us with JavaScript heap out of memory.

JavaScript has a garbage collector — it frees objects nothing references. So a “leak” really means we’re still referencing the object even though we don’t need it. Find the reference, break it, leak fixed.

The classic causes

1. Closures over big data

function buildHandler(hugeDataset) {
  return function (req, res) {
    res.json({ count: hugeDataset.length });
  };
}

app.get('/count', buildHandler(loadGigabyteFile()));

The handler captures hugeDataset. As long as the handler is registered (forever), the dataset stays in memory. Even if we only ever read .length from it.

Fix: don’t close over data we don’t need.

const count = loadGigabyteFile().length; // extract what we need
app.get('/count', (req, res) => res.json({ count }));
// hugeDataset can be GC'd now

2. EventEmitter listener leaks

Every .on() adds a listener. If we add listeners in a request handler without removing them, they pile up forever.

// BAD — adds a listener per request
app.get('/stream', (req, res) => {
  someEmitter.on('data', (chunk) => res.write(chunk));
});

Node warns us once we cross 10 listeners on the same event:

(node:1234) MaxListenersExceededWarning: Possible EventEmitter memory leak detected.
11 data listeners added to [EventEmitter].

Fixes:

Use .once() if we only need it once.
Remove the listener when we’re done: emitter.off('data', fn).
For per-request listeners, attach to a per-request object (the response stream), not a shared global emitter.

3. Unbounded global caches

const cache = new Map();
app.get('/user/:id', async (req, res) => {
  if (!cache.has(req.params.id)) {
    cache.set(req.params.id, await db.getUser(req.params.id));
  }
  res.json(cache.get(req.params.id));
});

Looks innocent. After a million unique user IDs, our Map has a million entries. Forever.

Fix: bounded cache with TTL/LRU.

import { LRUCache } from 'lru-cache';
const cache = new LRUCache({ max: 10_000, ttl: 1000 * 60 * 5 });

4. Timers that capture context

function handleConnection(conn) {
  setInterval(() => conn.ping(), 30_000);
}

If conn disconnects but we never clearInterval, the timer keeps conn alive. Always store the timer ID and clear it on cleanup.

5. Global arrays we push to and never drain

const recentRequests = [];
app.use((req, res, next) => {
  recentRequests.push({ url: req.url, time: Date.now() });
  next();
});

Grows forever. Use a ring buffer, or push to a real log system.

Spotting a leak

The telltale sign: RSS climbs steadily under steady load and never comes back down. A healthy process has memory that goes up during traffic, then GC reclaims it during quiet periods, oscillating in a band. A leaking process trends up monotonically.

// Cheap monitoring
setInterval(() => {
  const m = process.memoryUsage();
  console.log({
    rss: (m.rss / 1024 / 1024).toFixed(1) + 'MB',
    heapUsed: (m.heapUsed / 1024 / 1024).toFixed(1) + 'MB',
  });
}, 10_000);

Leak Detection Workflow

1. Reproduce — script that drives the suspect path in a loop

2. Take heap snapshot (baseline, after warmup)

3. Run loop for N minutes

4. Take second snapshot

5. DevTools → Comparison view → sort by Delta

6. Open the top growing class → "Retainers" → follow chain

7. Fix the reference. Re-test.

Force GC for cleaner snapshots

V8 might be holding objects that are technically collectable. Force a GC right before snapshotting:

node --expose-gc server.js

if (global.gc) global.gc();
// now take snapshot

Otherwise we end up chasing “leaks” that are really just GC laziness.

When it’s not actually a leak

First few minutes of high traffic — V8’s heap grows up to its working set. Normal.
heapTotal grows but heapUsed stays flat — heap fragmentation, not a leak.
Native memory growth — RSS grows but heap doesn’t. Could be a native addon (sharp, bcrypt, gRPC) leaking C++ memory. Way harder to debug.

The boring fix nobody talks about

If we can’t find the leak in a hurry and the process is going to OOM in 12 hours, restart it on a schedule. PM2’s max_memory_restart or Kubernetes’ liveness probe + memory limit will recycle the process before it dies. Not glamorous but buys us time to actually fix it.

References

Production

Error Handling Patterns

intermediate nodejs errors async production

Error handling in Node is a minefield because there are three different error-delivery mechanisms: thrown exceptions, callback’s err first argument, and rejected Promises. Mix them up and errors silently disappear.

The async/await rule

With async/await, errors propagate via thrown exceptions — same as sync code. try/catch catches them.

async function getUser(id) {
  try {
    const user = await db.findUser(id);
    return user;
  } catch (err) {
    logger.error({ err, id }, 'failed to load user');
    throw err; // re-throw, let caller decide
  }
}

Key word: re-throw. Catching to log and then returning undefined (silently swallowing) is how bugs hide for months. Either re-throw, or return a sentinel and document it loudly.

Promises without `await`

If we fire a promise and don’t await it (or .catch it), a rejection becomes an unhandled promise rejection. Bad.

// BAD
async function handler(req, res) {
  doBackgroundWork(); // returns a promise, we ignored it
  res.json({ ok: true });
}

If doBackgroundWork throws, the error vanishes. Either await it, or chain a .catch:

doBackgroundWork().catch((err) => logger.error({ err }, 'bg work failed'));

The two process-level safety nets

Node fires these events for errors we missed:

process.on('uncaughtException', (err, origin) => {
  logger.fatal({ err, origin }, 'uncaught exception');
  // do minimal sync cleanup, then EXIT
  process.exit(1);
});

process.on('unhandledRejection', (reason, promise) => {
  logger.error({ reason }, 'unhandled rejection');
  // In modern Node, these are fatal by default. Crash.
  throw reason;
});

In simple language: uncaughtException is “a thrown error nobody caught.” unhandledRejection is “a rejected promise nobody .catched.” Both mean we have a bug somewhere.

To crash or not to crash?

This is the interview question. The answer: on uncaughtException, always crash.

Why? After an uncaught exception, our process is in an undefined state. Half-completed transactions. Half-closed file descriptors. Variables in inconsistent state. Continuing to serve traffic could corrupt data.

The correct flow:

Error Decision Tree

Operational error (DB timeout, 404, invalid input, network blip)
→ catch, log, return error response. Keep running.

Programmer error (TypeError, ReferenceError, "cannot read property of undefined")
→ crash. Process manager restarts us. Fix the bug.

Out of memory
→ already crashing. Make sure restart is configured.

Operational errors = expected, recoverable. Programmer errors = bugs, unrecoverable mid-flight. Joyent’s classic article codified this distinction; it’s still the right model.

Express/Koa pattern

In Express 4, async route handlers don’t auto-forward rejected promises. Wrap them.

const asyncHandler = (fn) => (req, res, next) => {
  Promise.resolve(fn(req, res, next)).catch(next);
};

app.get('/users/:id', asyncHandler(async (req, res) => {
  const user = await db.findUser(req.params.id);
  if (!user) throw new NotFoundError('user');
  res.json(user);
}));

// Central error middleware
app.use((err, req, res, next) => {
  logger.error({ err, url: req.url }, 'request failed');
  const status = err.status || 500;
  res.status(status).json({ error: err.message });
});

Express 5 (now stable) auto-forwards async errors. One less footgun.

Custom error classes

Use error subclasses to tell apart “expected” errors from genuine bugs.

class AppError extends Error {
  constructor(message, status = 500) {
    super(message);
    this.name = this.constructor.name;
    this.status = status;
    this.isOperational = true;
  }
}

class NotFoundError extends AppError {
  constructor(resource) {
    super(`${resource} not found`, 404);
  }
}

// In the error middleware
if (!err.isOperational) {
  logger.fatal({ err }, 'non-operational error — restarting');
  process.exit(1);
}

Streams and EventEmitters

Streams emit 'error'. If nobody listens, Node crashes the process. Always attach:

fs.createReadStream('big.csv')
  .on('error', (err) => logger.error({ err }, 'read failed'))
  .pipe(transform)
  .on('error', (err) => logger.error({ err }, 'transform failed'));

Or better — use stream.pipeline which propagates errors cleanly:

import { pipeline } from 'node:stream/promises';

try {
  await pipeline(fs.createReadStream('in.csv'), transform, fs.createWriteStream('out.csv'));
} catch (err) {
  logger.error({ err }, 'pipeline failed');
}

Checklist

Wrap every async route handler so rejections reach error middleware.
Have a central error logger — never console.error and move on.
Subscribe to uncaughtException and unhandledRejection, log, then exit.
Run under a process manager (PM2, systemd, Docker restart policy) so crash → restart is fast.
Distinguish operational from programmer errors — recover from the first, crash on the second.

References

Logging

intermediate nodejs logging production observability

console.log is great in dev. In production it’s a disaster:

Blocks the event loop on a slow terminal/file.
No log levels — can’t filter “warn and above.”
Unstructured strings — grep works but querying (“show me all 500s in the last hour”) doesn’t.
No timestamps unless we add them manually.
No request correlation — can’t follow one request across many log lines.

In simple language: console.log is a printf, not a logger. We need a logger.

Structured (JSON) logs > free-text

Pre-cloud: tail -f app.log | grep ERROR. Post-cloud: logs go to Datadog/Loki/CloudWatch/ELK and get queried. Those systems work way better with JSON.

// Free-text — hard to query
2026-05-26 12:34:56 ERROR: user 42 failed login from 1.2.3.4

// Structured — every field is queryable
{"level":"error","time":1716720896000,"msg":"failed login","userId":42,"ip":"1.2.3.4"}

Now level:error AND userId:42 is a one-liner in any log system.

Pino — fast and JSON-first

Pino is the de facto standard for new Node services. Async, structured by default, very fast (claims ~5x faster than Winston in their benchmarks).

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
});

logger.info('server started');
logger.info({ port: 3000, env: 'prod' }, 'listening');
logger.error({ err, userId }, 'failed to load user');

Note the argument order: object first, message second. The object’s keys become top-level fields in the log line.

Output:

{"level":30,"time":1716720896,"pid":42,"hostname":"app-1","port":3000,"env":"prod","msg":"listening"}

level: 30 is info (10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal).

Child loggers for request context

app.use((req, res, next) => {
  req.log = logger.child({ reqId: crypto.randomUUID() });
  next();
});

app.get('/users/:id', async (req, res) => {
  req.log.info({ userId: req.params.id }, 'fetching user');
  // every log line in this request includes reqId automatically
});

Now we can trace one request through five log lines by filtering on reqId.

Pretty-print in dev

Raw JSON in dev is ugly. Pipe through pino-pretty:

node server.js | pino-pretty

Or configure it conditionally:

const logger = pino({
  transport: process.env.NODE_ENV !== 'production'
    ? { target: 'pino-pretty' }
    : undefined,
});

Winston — flexible, more batteries included

Winston has been around longer. More transports out of the box (files, HTTP, Slack, Loggly). More configurable formatters. Slower than Pino but rarely the bottleneck.

import winston from 'winston';

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
  ],
});

Pino vs Winston

	Pino	Winston
Speed	Very fast	Slower
Default format	JSON	Configurable
Transports	Worker-thread based	Built-in zoo
Best for	New services, microservices, high throughput	Legacy projects, complex routing needs

For new projects in 2026, default to Pino unless we have a specific reason for Winston.

Log levels — actually use them

trace — extremely verbose, “entered this function”
debug — dev-only details
info — normal lifecycle events (“server started”, “job processed”)
warn — something unexpected but not broken
error — request failed, operation failed
fatal — process is dying

Default to info in prod, debug in dev. Letting debug lines into prod logs makes them noisy and expensive.

What to log (and what NOT to)

Log:

Every incoming request (URL, status, latency, requestId, userId)
Every error with stack trace
Job/cron start and end
External API calls (target, duration, status)

Never log:

Passwords, tokens, API keys, session IDs
Full credit card numbers, PII (unless legally OK and you have redaction)
The full request body of every request (huge volume, often contains PII)

Use a redaction config:

const logger = pino({
  redact: ['req.headers.authorization', 'password', '*.token'],
});

Where logs go

In containerized environments (Docker, Kubernetes), log to stdout/stderr only. Don’t write log files inside the container. The orchestrator captures stdout and routes it to your log backend. Files inside containers vanish when the container restarts.

For VMs/bare metal, write to stdout and let systemd-journald / a sidecar agent ship them off.

References

Process Managers (PM2)

intermediate nodejs pm2 deployment production

If we run node server.js directly and it crashes, that’s it. The process is dead. A process manager solves this — it babysits our app, restarts it on crash, runs it as multiple workers, captures logs, and gives us a CLI to inspect everything.

PM2 is the most popular for Node. It’s not the only option (systemd, Docker restart policies, Kubernetes), but it’s the easiest to get going for a single VM.

What PM2 actually does

In simple language: PM2 is “a daemon that runs your Node apps and makes sure they stay running.”

Specifically:

Auto-restart on crash (with backoff)
Restart on memory threshold (max_memory_restart)
Cluster mode (forks N copies, load balances)
Log rotation and aggregation
Zero-downtime reload
pm2 startup hooks into systemd so PM2 survives reboots

Basic usage

npm i -g pm2

# Start an app
pm2 start server.js --name api

# See status
pm2 list

# Tail logs
pm2 logs api

# Restart / stop / delete
pm2 restart api
pm2 stop api
pm2 delete api

# Persist current process list across reboots
pm2 save
pm2 startup   # prints a sudo command — run it

Cluster mode — free horizontal scaling

-i max runs one instance per CPU core. Same idea as the cluster module, just declarative.

pm2 start server.js -i max --name api

PM2 handles the master process for us. Each worker is a real Node process with its own memory. Use this when our HTTP server is CPU-bound and we want to use all cores on one machine.

PM2 God Daemon (always running)

manages

api (cluster, 4 workers)
restarts: 2 · uptime: 4d

cron-worker (fork, 1)
restarts: 0 · uptime: 4d

queue-worker (fork, 2)
restarts: 1 · uptime: 3d

ecosystem.config.cjs — config as code

For anything beyond a one-liner, put settings in an ecosystem file. Then pm2 start ecosystem.config.cjs.

module.exports = {
  apps: [
    {
      name: 'api',
      script: './server.js',
      instances: 'max',
      exec_mode: 'cluster',
      max_memory_restart: '500M',
      env: {
        NODE_ENV: 'production',
        PORT: 3000,
      },
      error_file: './logs/api-err.log',
      out_file: './logs/api-out.log',
      time: true,
    },
    {
      name: 'cron-worker',
      script: './cron.js',
      instances: 1,
      exec_mode: 'fork',
      autorestart: true,
    },
  ],
};

Key options:

instances: 'max' + exec_mode: 'cluster' — one worker per core
max_memory_restart: '500M' — restart if worker exceeds 500MB (band-aid for leaks)
autorestart: true — default, restart on crash
cron_restart: '0 4 * * *' — restart at 4 AM daily (rarely needed but useful for leaky processes)

Zero-downtime reload

pm2 reload api

In cluster mode, this restarts workers one at a time. Each old worker keeps serving until the new one is ready, then it shuts down. No dropped requests if our app handles SIGINT/SIGTERM properly (graceful shutdown — covered in the next note).

pm2 restart is different: it kills and restarts. Brief downtime.

PM2 vs systemd vs Docker

	PM2	systemd	Docker / K8s
Setup	npm i -g, done	Write a unit file	Dockerfile + compose / manifest
Cluster mode	Built-in, free	Manual (multiple units)	Scale replicas
Logs	`pm2 logs`	`journalctl`	`docker logs` / k8s
Reload w/o downtime	Yes (cluster)	No (needs LB)	Yes (rolling deploy)
Best for	Single VM, fast iteration	Linux servers, no containers	Multi-host, microservices

Rule of thumb:

Just one VM, want something working today → PM2
VM, prefer OS-native, don’t want extra runtime → systemd unit
Already on Docker/Kubernetes → don’t use PM2, let the orchestrator restart containers. PM2 inside Docker is a common anti-pattern; the container should be the unit of restart.

PM2 gotchas

PM2 in Docker is usually wrong. Docker already restarts containers. Running PM2 inside hides crashes from Docker and complicates log capture. One container = one Node process.
pm2 startup setup is mandatory. Without it, a server reboot kills our apps. Run pm2 startup once, then pm2 save after every change to the process list.
Logs grow forever. Install pm2-logrotate (pm2 install pm2-logrotate) or use logrotate.
PM2’s free version doesn’t ship metrics. Keymetrics (their paid SaaS) does. For free, scrape pm2 jlist or expose your own metrics.

References

Graceful Shutdown

intermediate nodejs shutdown signals production docker

When Docker, Kubernetes, or PM2 wants to stop our app — for a deploy, a scale-down, or a node drain — they send SIGTERM. If our app ignores it, after a grace period (10 seconds for Docker, 30 for K8s) they send SIGKILL and we get killed mid-request.

That means: dropped HTTP requests, half-committed DB writes, lost jobs. In production, this is unacceptable.

Graceful shutdown is “react to SIGTERM, finish what we’re doing, then exit cleanly.”

The lifecycle

Graceful Shutdown Timeline

t=0 · Orchestrator sends SIGTERM

t=0+ · Stop accepting new connections (server.close())

t=0+ · Health check starts returning 503 → LB stops sending traffic

t=0..N · In-flight requests finish naturally

t=N · Close DB pool, Redis, message queue connections

t=N+ε · process.exit(0)

t=30s · Hard timeout — force exit if still alive (avoid SIGKILL)

A minimal Express implementation

import express from 'express';
import { pool } from './db.js';

const app = express();
app.get('/', async (req, res) => {
  await new Promise((r) => setTimeout(r, 2000)); // slow handler
  res.send('hi');
});

const server = app.listen(3000, () => console.log('listening on 3000'));

let shuttingDown = false;

// Health check that flips on shutdown
app.get('/healthz', (req, res) => {
  if (shuttingDown) return res.status(503).send('shutting down');
  res.send('ok');
});

async function shutdown(signal) {
  if (shuttingDown) return;
  shuttingDown = true;
  console.log(`${signal} received, shutting down`);

  // 1. Stop accepting new connections
  server.close((err) => {
    if (err) console.error('server.close error', err);
    console.log('http server closed');
  });

  // 2. Wait for in-flight, then close downstream resources
  // (server.close() waits for existing connections to finish)
  try {
    await pool.end();        // close pg pool
    // await redis.quit();   // close redis, etc.
    console.log('db closed');
  } catch (err) {
    console.error('cleanup error', err);
  }

  // 3. Hard timeout — if something's stuck, give up before SIGKILL hits
  setTimeout(() => {
    console.error('forced exit after 25s');
    process.exit(1);
  }, 25_000).unref();
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));   // Ctrl+C in dev

A few things worth calling out:

server.close() doesn’t kill existing connections. It stops accept() for new ones and waits for the current ones to finish. Exactly what we want.
Health check flips first. The load balancer needs a few seconds to notice we’re unhealthy and route traffic elsewhere. If we close the server immediately, the LB might send us one more request that hits a closed socket.
.unref() on the timeout. So the timer itself doesn’t keep the process alive if everything else finishes early.

The “stop accepting + drain” dance

In simple language: we’re telling the world “no more orders please” while still cooking the orders we already accepted. Once the kitchen is clear, we close up shop.

For long-lived connections (WebSockets, SSE), server.close() waits forever because those connections never end on their own. We have to actively tell clients to disconnect:

// For WebSockets
for (const ws of wsServer.clients) {
  ws.close(1001, 'server restarting');
}

For HTTP keep-alive, idle connections can hang around. Use the http-terminator library or set server.closeIdleConnections() (Node 18.2+) to forcibly close idle keep-alive sockets.

Why Docker/Kubernetes need this

Docker sends SIGTERM to PID 1 in the container, waits --stop-timeout (default 10s), then SIGKILL.

Kubernetes sends SIGTERM, waits terminationGracePeriodSeconds (default 30s), then SIGKILL.

If our Node app is PID 1 (running directly via CMD ["node", "server.js"]), we receive the signal. Done.

But if we use a shell form (CMD node server.js), the shell becomes PID 1 and does not forward signals. Our Node process never gets SIGTERM, falls to SIGKILL, drops requests. Bad.

Fix: always use exec form in Dockerfile.

# BAD — shell form
CMD node server.js

# GOOD — exec form, Node is PID 1
CMD ["node", "server.js"]

Or use tini / dumb-init as PID 1 if we need signal forwarding (e.g. when running via npm).

Kubernetes preStop hook

K8s has a subtle race: when a pod is terminated, the SIGTERM is sent at roughly the same time the pod is removed from the Service endpoints list. For a few seconds, traffic might still hit a shutting-down pod.

The fix is a preStop hook that sleeps before the signal is sent:

lifecycle:
  preStop:
    exec:
      command: ["sleep", "5"]

5 seconds is usually enough for the endpoints update to propagate. Our app keeps serving normally during the sleep, then gets SIGTERM and shuts down cleanly.

Common mistakes

No timeout. A stuck DB connection hangs shutdown() forever, then SIGKILL kills us. Always have a hard timeout that beats the orchestrator’s.
Closing the DB pool before HTTP finishes. Now in-flight requests can’t query the DB and fail. Order matters: HTTP first, then resources.
Catching SIGTERM but doing nothing. Worse than not handling it — Node’s default is to exit, our handler overrides that.
PM2 cluster reload — same story. PM2 sends SIGINT to each worker. If we don’t handle it, reload drops requests.
Running with nodemon or a shell wrapper in prod. They eat the signal. Use the runtime directly or tini.

Fundamentals

The two main pieces

Non-blocking I/O is the big idea

What we actually use Node for

Where Node is NOT a great fit

Quick history

References

The 6 phases

Microtasks run BETWEEN phases

setImmediate vs setTimeout(fn, 0)

process.nextTick — use carefully

Why Node feels concurrent

Practical takeaway

References

Blocking vs non-blocking

How libuv pulls this off

The flow of an async call

Why this makes Node fast (for I/O)

Tuning the thread pool

Worker threads — for CPU work

The cardinal rule

References

REPL — Read Eval Print Loop

REPL dot commands

Useful REPL tricks

Common CLI flags

-e and -p — quick one-liners

—watch — built-in nodemon

—inspect — debugging

—env-file — built-in dotenv

process.argv — reading CLI args

References

Modules & Package Management

The two systems at a glance

How Node decides which system a file is

Syntax side by side

The gotchas

1. __dirname doesn’t exist in ESM

2. ESM imports MUST include the extension

3. You can require() ESM (sometimes)

4. Named exports from CJS into ESM

5. JSON imports need an attribute

Dual packages — supporting both

When to use which

References

The big picture

Core modules win first

Relative and absolute paths — LOAD_AS_FILE then LOAD_AS_DIRECTORY

node_modules tree walk

Loading a node_modules package

The exports field changes things

Caching — modules load once

Inspecting resolution

Common gotchas

References

name and version

type — CJS or ESM?

main, module, exports — the entry points

scripts — our project’s commands

The three dependency buckets

Version range syntax

engines — declare runtime requirements

Other useful fields

Generating it

References

The big idea behind each

The install layout difference

Why pnpm is so much faster (and uses less disk)

Strictness — phantom dependencies

Lockfiles — three formats, same purpose

Common commands side by side

Workspaces / monorepos

Which one should we use?

References

Core APIs

Why we need it

Creating buffers

Encodings

Common operations

slice shares memory — careful

1. `__dirname` doesn’t exist in ESM

3. You can `require()` ESM (sometimes)

`slice` shares memory — careful