Post

Docker Multi-Stage Builds and Optimization Strategies

Introduction to Docker Multi-Stage Builds

Multi-stage builds are one of the most powerful features in Docker for creating optimized, production-ready container images. They allow you to use multiple FROM statements in a Dockerfile, enabling you to separate build dependencies from runtime dependencies, resulting in significantly smaller and more secure images.

This guide covers multi-stage builds, optimization techniques, and best practices for creating efficient Docker images.

Why Multi-Stage Builds Matter

Problems with Single-Stage Builds

Traditional single-stage Dockerfiles include everything needed to build and run an application.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# BAD - Single stage with everything
FROM node:18
WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    make \
    g++ \
    git

# Copy source code
COPY package*.json ./
RUN npm install

# Build application
COPY . .
RUN npm run build

# Run application
CMD ["npm", "start"]

Problems:

  • Large image size (includes build tools and dependencies)
  • Security vulnerabilities (unnecessary packages)
  • Slower deployment times
  • Exposes source code and build artifacts

Benefits of Multi-Stage Builds

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# GOOD - Multi-stage build
# Stage 1: Build
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Production
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package*.json ./

USER node
CMD ["node", "dist/index.js"]

Benefits:

  • Smaller final image (only runtime dependencies)
  • Better security (fewer attack surfaces)
  • Faster deployments
  • Cleaner separation of concerns
  • No source code in final image

Multi-Stage Build Patterns

1. Node.js Application

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Build stage
FROM node:18 AS build
WORKDIR /app

# Copy dependency definitions
COPY package*.json ./
COPY tsconfig.json ./

# Install all dependencies (including dev dependencies)
RUN npm ci

# Copy source code
COPY src ./src

# Build TypeScript to JavaScript
RUN npm run build

# Production stage
FROM node:18-alpine AS production
WORKDIR /app

# Copy package files
COPY package*.json ./

# Install only production dependencies
RUN npm ci --only=production

# Copy built application from build stage
COPY --from=build /app/dist ./dist

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

# Change ownership
RUN chown -R nodejs:nodejs /app

# Switch to non-root user
USER nodejs

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD node -e "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"

# Start application
CMD ["node", "dist/index.js"]

2. Go Application

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Build stage
FROM golang:1.21 AS builder
WORKDIR /app

# Copy go mod files
COPY go.mod go.sum ./

# Download dependencies
RUN go mod download

# Copy source code
COPY . .

# Build application
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .

# Production stage - using scratch for minimal image
FROM scratch
WORKDIR /app

# Copy CA certificates for HTTPS
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# Copy binary from builder
COPY --from=builder /app/main .

# Expose port
EXPOSE 8080

# Run application
CMD ["./main"]

Image size comparison:

  • Full Go image: ~800MB
  • Multi-stage with Alpine: ~15MB
  • Multi-stage with scratch: ~10MB

3. Python Application

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Build stage
FROM python:3.11 AS builder
WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .

# Install Python dependencies to a virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
RUN pip install --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.11-slim
WORKDIR /app

# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv

# Copy application code
COPY . .

# Activate virtual environment
ENV PATH="/opt/venv/bin:$PATH"

# Create non-root user
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

# Expose port
EXPOSE 8000

# Run application
CMD ["python", "app.py"]

4. Java Application

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Build stage
FROM maven:3.9-eclipse-temurin-17 AS builder
WORKDIR /app

# Copy pom.xml and download dependencies (cached layer)
COPY pom.xml .
RUN mvn dependency:go-offline

# Copy source and build
COPY src ./src
RUN mvn clean package -DskipTests

# Production stage
FROM eclipse-temurin:17-jre-alpine
WORKDIR /app

# Copy JAR from builder
COPY --from=builder /app/target/*.jar app.jar

# Create non-root user
RUN addgroup -S spring && adduser -S spring -G spring
USER spring:spring

# Expose port
EXPOSE 8080

# Set JVM options
ENV JAVA_OPTS="-Xmx512m -Xms256m"

# Run application
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]

5. React Application with Nginx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Build stage
FROM node:18 AS builder
WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci

# Copy source code
COPY . .

# Build production application
RUN npm run build

# Production stage
FROM nginx:alpine
WORKDIR /usr/share/nginx/html

# Remove default nginx static assets
RUN rm -rf ./*

# Copy built application from builder
COPY --from=builder /app/build .

# Copy custom nginx configuration
COPY nginx.conf /etc/nginx/conf.d/default.conf

# Expose port
EXPOSE 80

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD wget --quiet --tries=1 --spider http://localhost/health || exit 1

# Start nginx
CMD ["nginx", "-g", "daemon off;"]

Advanced Multi-Stage Techniques

1. Named Stages for Flexibility

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Base stage with common dependencies
FROM node:18-alpine AS base
WORKDIR /app
COPY package*.json ./

# Development stage
FROM base AS development
RUN npm install
COPY . .
CMD ["npm", "run", "dev"]

# Test stage
FROM base AS test
RUN npm ci
COPY . .
RUN npm run test

# Build stage
FROM base AS build
RUN npm ci --only=production
COPY . .
RUN npm run build

# Production stage
FROM nginx:alpine AS production
COPY --from=build /app/dist /usr/share/nginx/html
CMD ["nginx", "-g", "daemon off;"]

Build specific stages:

1
2
3
4
5
6
7
8
# Build for development
docker build --target development -t myapp:dev .

# Build for testing
docker build --target test -t myapp:test .

# Build for production (default)
docker build -t myapp:prod .

2. Using Build Arguments

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
ARG NODE_VERSION=18
FROM node:${NODE_VERSION}-alpine AS builder

ARG BUILD_ENV=production
ENV NODE_ENV=${BUILD_ENV}

WORKDIR /app
COPY package*.json ./

# Conditional dependency installation
RUN if [ "$BUILD_ENV" = "production" ]; then \
        npm ci --only=production; \
    else \
        npm install; \
    fi

COPY . .
RUN npm run build

FROM node:${NODE_VERSION}-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/index.js"]

Build with arguments:

1
docker build --build-arg NODE_VERSION=20 --build-arg BUILD_ENV=staging -t myapp .

3. Using External Images as Stages

1
2
3
4
5
6
7
8
9
10
11
12
# Use pre-built base image
FROM mycompany/node-base:latest AS base

# Build stage using the base
FROM base AS builder
WORKDIR /app
COPY . .
RUN npm run build

# Production using another pre-built image
FROM mycompany/nginx-base:latest
COPY --from=builder /app/dist /usr/share/nginx/html

4. Copying from Multiple Stages

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Frontend build
FROM node:18 AS frontend-builder
WORKDIR /app
COPY frontend/package*.json ./
RUN npm ci
COPY frontend .
RUN npm run build

# Backend build
FROM golang:1.21 AS backend-builder
WORKDIR /app
COPY backend/go.* ./
RUN go mod download
COPY backend .
RUN CGO_ENABLED=0 go build -o server

# Combine both in final stage
FROM alpine:latest
WORKDIR /app

# Copy backend binary
COPY --from=backend-builder /app/server .

# Copy frontend static files
COPY --from=frontend-builder /app/dist ./static

# Install ca-certificates
RUN apk --no-cache add ca-certificates

EXPOSE 8080
CMD ["./server"]

Image Optimization Techniques

1. Layer Caching Optimization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# BAD - Invalidates cache on any code change
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build

# GOOD - Optimized layer caching
FROM node:18-alpine
WORKDIR /app

# Cache dependencies (changes less frequently)
COPY package*.json ./
RUN npm ci

# Copy and build code (changes more frequently)
COPY . .
RUN npm run build

2. Minimize Layer Count

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# BAD - Many layers
FROM alpine:latest
RUN apk add --no-cache curl
RUN apk add --no-cache wget
RUN apk add --no-cache git
RUN rm -rf /var/cache/apk/*

# GOOD - Combined into single layer
FROM alpine:latest
RUN apk add --no-cache \
    curl \
    wget \
    git \
    && rm -rf /var/cache/apk/*

3. Use .dockerignore

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# .dockerignore
node_modules
npm-debug.log
.git
.gitignore
.env
.env.local
.DS_Store
*.md
.vscode
.idea
coverage
.nyc_output
dist
build
*.log

4. Use Specific Base Images

1
2
3
4
5
6
7
8
# BAD - Large base image
FROM ubuntu:latest

# BETTER - Smaller base image
FROM node:18-alpine

# BEST - Minimal base for compiled binaries
FROM scratch

5. Multi-Architecture Builds

1
2
3
4
5
6
7
8
9
10
11
# Dockerfile with multi-arch support
FROM --platform=$BUILDPLATFORM golang:1.21 AS builder
ARG TARGETOS
ARG TARGETARCH
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o app

FROM alpine:latest
COPY --from=builder /app/app .
CMD ["./app"]

Build for multiple architectures:

1
docker buildx build --platform linux/amd64,linux/arm64,linux/arm/v7 -t myapp:latest .

Security Best Practices

1. Run as Non-Root User

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
FROM node:18-alpine

# Create app directory
WORKDIR /app

# Copy application
COPY --chown=node:node . .

# Install dependencies
RUN npm ci --only=production

# Use non-root user
USER node

CMD ["node", "index.js"]

2. Scan for Vulnerabilities

1
2
3
4
5
6
7
8
# Using Docker Scout
docker scout cves myapp:latest

# Using Trivy
trivy image myapp:latest

# Using Snyk
snyk container test myapp:latest

3. Use Specific Image Tags

1
2
3
4
5
# BAD - Unpredictable
FROM node:latest

# GOOD - Specific version
FROM node:18.17.1-alpine3.18

4. Minimize Attack Surface

1
2
3
4
5
6
7
8
9
10
11
12
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

FROM gcr.io/distroless/nodejs18-debian11
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["dist/index.js"]

Build Performance Optimization

1. Use BuildKit

1
2
3
4
5
6
7
8
9
10
11
12
# Enable BuildKit
export DOCKER_BUILDKIT=1

# Or in docker-compose.yml
version: '3.8'
services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      - DOCKER_BUILDKIT=1

2. Parallel Builds with BuildKit

1
2
3
4
5
6
7
8
9
10
11
12
# syntax=docker/dockerfile:1.4

FROM alpine AS stage1
RUN sleep 5 && echo "Stage 1 complete"

FROM alpine AS stage2
RUN sleep 5 && echo "Stage 2 complete"

# Both stages can build in parallel
FROM alpine
COPY --from=stage1 /etc/os-release /stage1-info
COPY --from=stage2 /etc/os-release /stage2-info

3. Cache Mounts

1
2
3
4
5
6
7
8
9
10
11
# syntax=docker/dockerfile:1.4

FROM node:18-alpine
WORKDIR /app

# Use cache mount for npm
RUN --mount=type=cache,target=/root/.npm \
    npm ci --only=production

COPY . .
CMD ["node", "index.js"]

4. Secret Mounts

1
2
3
4
5
# syntax=docker/dockerfile:1.4

FROM alpine
RUN --mount=type=secret,id=aws_credentials \
    cat /run/secrets/aws_credentials

Build with secrets:

1
docker build --secret id=aws_credentials,src=$HOME/.aws/credentials .

Real-World Example: Full-Stack Application

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# syntax=docker/dockerfile:1.4

# Stage 1: Build frontend
FROM node:18-alpine AS frontend-builder
WORKDIR /app/frontend

COPY frontend/package*.json ./
RUN --mount=type=cache,target=/root/.npm npm ci

COPY frontend .
RUN npm run build

# Stage 2: Build backend
FROM golang:1.21-alpine AS backend-builder
WORKDIR /app/backend

# Install dependencies
RUN apk add --no-cache git

# Copy go.mod and download dependencies
COPY backend/go.* ./
RUN --mount=type=cache,target=/go/pkg/mod go mod download

# Copy source and build
COPY backend .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags="-w -s" -o server

# Stage 3: Final production image
FROM alpine:latest

# Install ca-certificates for HTTPS
RUN apk --no-cache add ca-certificates tzdata

# Create non-root user
RUN addgroup -g 1000 app && \
    adduser -D -u 1000 -G app app

WORKDIR /home/app

# Copy backend binary
COPY --from=backend-builder /app/backend/server .

# Copy frontend static files
COPY --from=frontend-builder /app/frontend/dist ./static

# Copy configuration
COPY config.yaml .

# Set ownership
RUN chown -R app:app /home/app

# Switch to non-root user
USER app

# Expose port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD wget --quiet --tries=1 --spider http://localhost:8080/health || exit 1

# Run application
CMD ["./server"]

Monitoring Image Size

1
2
3
4
5
6
7
8
# Check image size
docker images myapp

# Analyze image layers
docker history myapp:latest

# Use dive for detailed analysis
dive myapp:latest

Best Practices Summary

1. Use Multi-Stage Builds:

  • Separate build and runtime environments
  • Copy only necessary artifacts to final stage
  • Use minimal base images for production

2. Optimize Layers:

  • Order instructions from least to most frequently changed
  • Combine related commands
  • Use .dockerignore to exclude unnecessary files

3. Security:

  • Run as non-root user
  • Use specific image tags
  • Scan for vulnerabilities
  • Minimize installed packages

4. Performance:

  • Enable BuildKit
  • Use cache mounts
  • Leverage layer caching
  • Build for multiple architectures when needed

5. Maintainability:

  • Use clear stage names
  • Document Dockerfile
  • Keep Dockerfiles simple
  • Use build arguments for flexibility

Conclusion

Multi-stage builds are essential for creating optimized, secure, and maintainable Docker images. By following these patterns and best practices, you can:

  • Reduce image sizes by 10x or more
  • Improve build times with better caching
  • Enhance security by minimizing attack surfaces
  • Create more maintainable Dockerfiles
  • Optimize deployment pipelines

Key takeaways:

  • Always use multi-stage builds for production
  • Order layers for optimal caching
  • Use appropriate base images
  • Run containers as non-root users
  • Leverage BuildKit features
  • Monitor and optimize image sizes regularly

References

This post is licensed under CC BY 4.0 by the author.