Writing a Dockerfile

intermediate docker dockerfile multi-stage layers

A Dockerfile is a text file with step-by-step instructions for building a Docker image. Think of it as a recipe — start with a base ingredient (base image), add our code, install dependencies, and define how to run it. Docker reads this file top to bottom and creates an image layer by layer.

Common Instructions

Let’s go through the instructions we’ll use in almost every Dockerfile.

  • FROM — The base image to start from. Every Dockerfile starts here.
  • WORKDIR — Sets the working directory inside the container. Like cd but for the build.
  • COPY — Copies files from our machine into the image.
  • RUN — Executes a command during the build (install deps, compile, etc.).
  • CMD — The default command to run when the container starts.
  • EXPOSE — Documents which port the app listens on (it’s just metadata, doesn’t actually open the port).
  • ENV — Sets environment variables that persist in the running container.
  • ARG — Build-time variables. Only available during docker build, not in the running container.

A Practical Dockerfile for Node.js

Here’s a complete, production-ready Dockerfile for a Node.js app.

# Start with a lightweight Node.js base image
FROM node:20-alpine

# Set working directory inside the container
WORKDIR /app

# Copy package files first (for layer caching — explained below)
COPY package.json package-lock.json ./

# Install dependencies
RUN npm ci --only=production

# Now copy the rest of the application code
COPY . .

# Document the port our app listens on
EXPOSE 3000

# Default command when container starts
CMD ["node", "server.js"]

Build and run it:

# Build the image (the dot means "use current directory as context")
docker build -t my-app .

# Run it
docker run -d -p 3000:3000 --name my-app my-app

Layer Caching — Why Order Matters

This is the most important optimization concept in Dockerfiles. Docker builds images in layers. Each instruction creates a new layer, and Docker caches each one. If a layer hasn’t changed, Docker reuses the cached version instead of rebuilding it.

Here’s why we copy package.json separately before copying the rest of the code:

Docker Layer Cache
Layer 1: FROM node:20-alpine
cached
Layer 2: COPY package*.json
cached (deps didn't change)
Layer 3: RUN npm ci
cached (package.json unchanged)
Layer 4: COPY . .
rebuilt (code changed)
When only our code changes, npm install is skipped entirely — saves minutes per build.

If we did COPY . . first and then RUN npm ci, Docker would reinstall all dependencies every time we changed even a single line of code. By copying package.json first, npm ci only reruns when our dependencies actually change.

Rule of thumb: put things that change least at the top, things that change most at the bottom.

Multi-Stage Builds

In a single-stage build, our final image includes everything — dev dependencies, build tools, source files. That’s wasteful. Multi-stage builds let us use one stage to build the app and a second stage to run it, keeping only what we need.

# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci                          # install ALL deps (including devDeps)
COPY . .
RUN npm run build                   # compile TypeScript, bundle, etc.

# Stage 2: Production
FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production        # only production deps
COPY --from=builder /app/dist ./dist # copy built files from stage 1

EXPOSE 3000
CMD ["node", "dist/server.js"]

The final image only has production dependencies and the compiled output. No TypeScript compiler, no dev tools, no source code. This can cut image size by 50-80%.

The .dockerignore File

Just like .gitignore keeps files out of git, .dockerignore keeps files out of the Docker build context. Without it, COPY . . sends everything to the Docker daemon — including node_modules, .git, .env, and other junk.

# .dockerignore
node_modules          # we install fresh inside the container
.git                  # no need for git history
.env                  # never bake secrets into images
.env.*                # environment-specific files
dist                  # we build fresh inside the container
*.md                  # docs don't belong in the image
.DS_Store             # macOS junk

This makes builds faster (smaller context to send) and more secure (no secrets in the image).

Image Size Optimization Tips

A few quick wins for smaller images:

  • Use Alpine-based imagesnode:20-alpine is ~50MB vs node:20 at ~350MB
  • Multi-stage builds — only ship what the app needs to run
  • Combine RUN commands — each RUN creates a layer. Combine related commands with &&
  • Clean up in the same layerRUN apt-get install -y curl && apt-get clean && rm -rf /var/lib/apt/lists/*
  • Use .dockerignore — keep unnecessary files out of the build context
# Bad — 3 layers for related operations
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean

# Good — 1 layer, cleaned up
RUN apt-get update && \
    apt-get install -y curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

In simple language, a Dockerfile is a recipe that builds our app into a portable image — and the order of instructions matters because Docker caches each layer to speed up rebuilds.