When Docker, Kubernetes, or PM2 wants to stop our app — for a deploy, a scale-down, or a node drain — they send SIGTERM. If our app ignores it, after a grace period (10 seconds for Docker, 30 for K8s) they send SIGKILL and we get killed mid-request.
That means: dropped HTTP requests, half-committed DB writes, lost jobs. In production, this is unacceptable.
Graceful shutdown is “react to SIGTERM, finish what we’re doing, then exit cleanly.”
The lifecycle
server.close())process.exit(0)A minimal Express implementation
import express from 'express';
import { pool } from './db.js';
const app = express();
app.get('/', async (req, res) => {
await new Promise((r) => setTimeout(r, 2000)); // slow handler
res.send('hi');
});
const server = app.listen(3000, () => console.log('listening on 3000'));
let shuttingDown = false;
// Health check that flips on shutdown
app.get('/healthz', (req, res) => {
if (shuttingDown) return res.status(503).send('shutting down');
res.send('ok');
});
async function shutdown(signal) {
if (shuttingDown) return;
shuttingDown = true;
console.log(`${signal} received, shutting down`);
// 1. Stop accepting new connections
server.close((err) => {
if (err) console.error('server.close error', err);
console.log('http server closed');
});
// 2. Wait for in-flight, then close downstream resources
// (server.close() waits for existing connections to finish)
try {
await pool.end(); // close pg pool
// await redis.quit(); // close redis, etc.
console.log('db closed');
} catch (err) {
console.error('cleanup error', err);
}
// 3. Hard timeout — if something's stuck, give up before SIGKILL hits
setTimeout(() => {
console.error('forced exit after 25s');
process.exit(1);
}, 25_000).unref();
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT')); // Ctrl+C in dev
A few things worth calling out:
server.close()doesn’t kill existing connections. It stopsaccept()for new ones and waits for the current ones to finish. Exactly what we want.- Health check flips first. The load balancer needs a few seconds to notice we’re unhealthy and route traffic elsewhere. If we close the server immediately, the LB might send us one more request that hits a closed socket.
.unref()on the timeout. So the timer itself doesn’t keep the process alive if everything else finishes early.
The “stop accepting + drain” dance
In simple language: we’re telling the world “no more orders please” while still cooking the orders we already accepted. Once the kitchen is clear, we close up shop.
For long-lived connections (WebSockets, SSE), server.close() waits forever because those connections never end on their own. We have to actively tell clients to disconnect:
// For WebSockets
for (const ws of wsServer.clients) {
ws.close(1001, 'server restarting');
}
For HTTP keep-alive, idle connections can hang around. Use the http-terminator library or set server.closeIdleConnections() (Node 18.2+) to forcibly close idle keep-alive sockets.
Why Docker/Kubernetes need this
Docker sends SIGTERM to PID 1 in the container, waits --stop-timeout (default 10s), then SIGKILL.
Kubernetes sends SIGTERM, waits terminationGracePeriodSeconds (default 30s), then SIGKILL.
If our Node app is PID 1 (running directly via CMD ["node", "server.js"]), we receive the signal. Done.
But if we use a shell form (CMD node server.js), the shell becomes PID 1 and does not forward signals. Our Node process never gets SIGTERM, falls to SIGKILL, drops requests. Bad.
Fix: always use exec form in Dockerfile.
# BAD — shell form
CMD node server.js
# GOOD — exec form, Node is PID 1
CMD ["node", "server.js"]
Or use tini / dumb-init as PID 1 if we need signal forwarding (e.g. when running via npm).
Kubernetes preStop hook
K8s has a subtle race: when a pod is terminated, the SIGTERM is sent at roughly the same time the pod is removed from the Service endpoints list. For a few seconds, traffic might still hit a shutting-down pod.
The fix is a preStop hook that sleeps before the signal is sent:
lifecycle:
preStop:
exec:
command: ["sleep", "5"]
5 seconds is usually enough for the endpoints update to propagate. Our app keeps serving normally during the sleep, then gets SIGTERM and shuts down cleanly.
Common mistakes
- No timeout. A stuck DB connection hangs
shutdown()forever, then SIGKILL kills us. Always have a hard timeout that beats the orchestrator’s. - Closing the DB pool before HTTP finishes. Now in-flight requests can’t query the DB and fail. Order matters: HTTP first, then resources.
- Catching SIGTERM but doing nothing. Worse than not handling it — Node’s default is to exit, our handler overrides that.
- PM2 cluster
reload— same story. PM2 sends SIGINT to each worker. If we don’t handle it, reload drops requests. - Running with
nodemonor a shell wrapper in prod. They eat the signal. Use the runtime directly or tini.