Go Profiling with pprof: Finding and Fixing Performance Bottlenecks

Introduction#

Go ships with a built-in profiler (pprof) that measures CPU usage, memory allocation, goroutine counts, and mutex contention. Unlike external profilers, pprof integrates directly into your binary and can be enabled in production with minimal overhead.

Enabling pprof in HTTP Servers#

package main

import (
    "net/http"
    _ "net/http/pprof"  // registers /debug/pprof/* handlers
    "log"
)

func main() {
    // Application server
    go func() {
        http.ListenAndServe(":8080", appRouter())
    }()

    // Profiling server — bind to localhost only, never expose publicly
    log.Fatal(http.ListenAndServe("localhost:6060", nil))
}

# Capture 30-second CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Capture heap profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Goroutine dump (find goroutine leaks)
curl -s http://localhost:6060/debug/pprof/goroutine?debug=2

# Block profile (where goroutines block on channel/mutex)
go tool pprof http://localhost:6060/debug/pprof/block

# Mutex contention
go tool pprof http://localhost:6060/debug/pprof/mutex

Analyzing CPU Profiles#

# Interactive pprof session
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Inside pprof:
(pprof) top10          # top 10 functions by CPU time
(pprof) top10 -cum     # top 10 by cumulative time (includes callees)
(pprof) list ParseJSON # show annotated source for ParseJSON function
(pprof) web            # open SVG flame graph in browser
(pprof) pdf            # export PDF flame graph

# One-liner: capture and open flame graph
go tool pprof -http=:8081 http://localhost:6060/debug/pprof/profile?seconds=10

Benchmarks with pprof#

// bench_test.go
package mypackage

import (
    "testing"
)

func BenchmarkParseRequest(b *testing.B) {
    data := []byte(`{"user_id":42,"items":[{"id":1,"qty":2}]}`)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        ParseRequest(data)
    }
}

# Run benchmark and capture CPU + memory profiles
go test -bench=BenchmarkParseRequest -benchmem \
    -cpuprofile=cpu.prof \
    -memprofile=mem.prof \
    -count=3

# Analyze CPU profile
go tool pprof cpu.prof

# Analyze memory allocations
go tool pprof mem.prof
(pprof) top10 -cum -sample_index=alloc_space

Memory Profiling#

// Force a GC before capturing heap profile for cleaner data
import "runtime"

func captureHeapProfile() {
    runtime.GC()
    // Now take the profile
}

# Heap profile: what is currently alive
go tool pprof http://localhost:6060/debug/pprof/heap

# Allocation profile: what has been allocated (including GC'd)
go tool pprof http://localhost:6060/debug/pprof/allocs

# Inside pprof — find allocation hotspots:
(pprof) top10 -sample_index=alloc_space
(pprof) list json.Marshal

Detecting Goroutine Leaks#

// A goroutine leak: goroutine started but never exits
func leakyHandler(w http.ResponseWriter, r *http.Request) {
    ch := make(chan int)
    go func() {
        v := <-ch  // blocks forever if nothing sends to ch
        _ = v
    }()
    w.Write([]byte("ok"))
    // ch is garbage collected but goroutine is still blocked
}

// Fix: use context for cancellation
func fixedHandler(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()
    ch := make(chan int, 1)
    go func() {
        select {
        case v := <-ch:
            _ = v
        case <-ctx.Done():
            return  // exits when request completes
        }
    }()
    w.Write([]byte("ok"))
}

# Monitor goroutine count over time
watch -n5 'curl -s http://localhost:6060/debug/pprof/goroutine?debug=1 | head -5'
# If count grows steadily: goroutine leak

# Detailed goroutine dump to find what's blocking
curl http://localhost:6060/debug/pprof/goroutine?debug=2 | head -100

Continuous Profiling in Production#

// Send profiles to a profiling backend (e.g., Pyroscope)
import "github.com/grafana/pyroscope-go"

pyroscope.Start(pyroscope.Config{
    ApplicationName: "my-service",
    ServerAddress:   "http://pyroscope:4040",
    ProfileTypes: []pyroscope.ProfileType{
        pyroscope.ProfileCPU,
        pyroscope.ProfileAllocObjects,
        pyroscope.ProfileAllocSpace,
        pyroscope.ProfileInuseObjects,
        pyroscope.ProfileInuseSpace,
    },
})

Common Performance Patterns Found with pprof#

// BAD: allocating in hot path
func processItems(items []string) []Result {
    results := []Result{}  // grows, causes reallocations
    for _, item := range items {
        results = append(results, process(item))
    }
    return results
}

// GOOD: pre-allocate
func processItems(items []string) []Result {
    results := make([]Result, 0, len(items))  // pre-allocate exact size
    for _, item := range items {
        results = append(results, process(item))
    }
    return results
}

// BAD: string concatenation in loop
func buildQuery(parts []string) string {
    result := ""
    for _, p := range parts {
        result += p + " "  // allocates new string each iteration
    }
    return result
}

// GOOD: strings.Builder
func buildQuery(parts []string) string {
    var b strings.Builder
    b.Grow(len(parts) * 10)  // pre-estimate capacity
    for _, p := range parts {
        b.WriteString(p)
        b.WriteByte(' ')
    }
    return b.String()
}

Conclusion#

Enable pprof on a localhost-only port in production and capture profiles during load spikes. CPU profiles show where time is spent; heap profiles show allocation hotspots; goroutine dumps reveal leaks. The go tool pprof -http web UI with flame graphs is the most productive interface. For continuous visibility, integrate with Pyroscope or Parca for always-on profiling.