Introduction#
Understanding Linux memory management helps diagnose OOM kills, tune JVM heap settings, interpret top and free output correctly, and set sensible container memory limits. This post covers the key concepts that matter day-to-day.
Virtual Memory#
Every process sees a private virtual address space — 128TB on 64-bit Linux. The OS maps virtual addresses to physical RAM via page tables maintained by the kernel.
1
2
3
4
5
6
7
8
# View virtual memory layout of a process
cat /proc/$(pgrep python | head -1)/maps
# 7f8b3a000000-7f8b3a200000 r--p /usr/lib/python3.12/...
# [stack]
# [heap]
# Summary of memory regions
pmap -x $(pgrep python | head -1)
Pages and the Page Cache#
Memory is managed in pages (4KB by default). When you read a file, Linux reads it into the page cache — kernel memory that holds recently accessed file data. Subsequent reads are served from cache without disk I/O.
1
2
3
4
5
6
7
8
9
10
# Show page cache and memory breakdown
free -h
# total used free shared buff/cache available
# Mem: 15G 4G 2G 200M 9G 10G
#
# "available" is the better metric than "free"
# available = free + reclaimable page cache
# Drop page cache (for benchmarking — don't do this in production)
echo 3 > /proc/sys/vm/drop_caches
The available column is what matters. A system with 2GB “free” but 9GB page cache has 11GB effectively available for new processes.
Memory Overcommit#
Linux allows processes to allocate more virtual memory than physical RAM exists (malloc succeeds even without physical backing). Physical pages are allocated lazily when first written to.
1
2
3
4
5
6
7
8
9
10
# Overcommit modes
cat /proc/sys/vm/overcommit_memory
# 0 = heuristic (default): allows reasonable overcommit
# 1 = always allow: malloc always succeeds
# 2 = never: fail if total commit exceeds RAM + swap
# Current committed memory
cat /proc/meminfo | grep Committed
# CommitLimit: 18000000 kB
# Committed_AS: 12000000 kB ← total virtual memory committed
OOM Killer#
When physical memory is exhausted, the kernel’s OOM killer selects and kills a process.
1
2
3
4
5
6
7
8
9
# Each process has an OOM score (higher = more likely to be killed)
cat /proc/$(pgrep java | head -1)/oom_score
# Adjust OOM score (range: -17 to +15, lower = protected)
echo -15 > /proc/$(pgrep sshd | head -1)/oom_score_adj # protect sshd
# View OOM kill events
dmesg | grep "Out of memory"
journalctl -k | grep "oom_kill"
In Kubernetes, a container exceeding its memory limit causes an OOM kill of processes inside the container (not the whole node).
1
2
3
# Check if a pod was OOM killed
kubectl describe pod my-pod | grep -A5 "OOMKilled"
kubectl get pod my-pod -o jsonpath='{.status.containerStatuses[0].state.terminated.reason}'
Understanding Memory Metrics#
1
2
3
4
5
6
7
8
9
10
# Per-process memory columns in top/ps
# VSZ (Virtual Set Size): total virtual memory allocated
# RSS (Resident Set Size): physical RAM currently in use
# SHR: shared pages (shared libraries count once in RAM, multiple times in VSZ/RSS)
ps aux --sort=-rss | head -10
# More accurate: PSS (Proportional Set Size) — divides shared pages
# requires smaps
cat /proc/$(pgrep nginx | head -1)/smaps_rollup | grep -E "^(Pss|Rss)"
RSS overstates memory usage for processes sharing libraries. PSS is more accurate but requires /proc/smaps.
Swap#
Swap is disk space used as overflow when RAM is full. The kernel moves cold pages to swap to free RAM.
1
2
3
4
5
6
7
8
9
10
11
# Show swap usage
swapon --show
free -h
# Swappiness: 0 = avoid swap, 100 = swap aggressively
cat /proc/sys/vm/swappiness # default: 60
# For database servers: set to 1 (avoid swap almost entirely)
sysctl vm.swappiness=1
# Check if a process is using swap
cat /proc/$(pgrep java | head -1)/status | grep VmSwap
High swap usage causes severe latency (disk I/O instead of RAM access). Monitor vmstat:
1
2
3
4
5
vmstat 1
# procs -----------memory---------- ---swap-- -----io----
# r b swpd free buff cache si so bi bo
# 2 0 0 2048000 1024 8192000 0 0 0 0
# si = swap in (pages/sec), so = swap out — both should be 0 normally
Huge Pages#
Default 4KB pages cause TLB pressure for large memory applications. Huge pages (2MB) reduce TLB misses significantly for databases and JVMs.
1
2
3
4
5
6
7
8
9
10
# Transparent Huge Pages (THP) — automatic, can cause latency spikes
cat /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never
# For databases: set to madvise or never
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
# Explicit Huge Pages for JVM
# -XX:+UseLargePages in JVM flags
# Requires kernel.nr_hugepages to be set
sysctl vm.nr_hugepages=512 # allocate 512 * 2MB = 1GB of huge pages
Container Memory Limits#
Container memory limits map directly to cgroup memory limits. The JVM and some other runtimes historically did not respect container limits — they saw total host RAM.
1
2
3
4
5
6
7
8
# Java 8u191+ respects container limits automatically
# Verify:
java -XX:+PrintFlagsFinal -version 2>&1 | grep MaxHeapSize
# Explicitly set heap relative to container limit
docker run --memory=512m openjdk:17 java \
-XX:MaxRAMPercentage=75 \ # use 75% of container limit for heap
-jar app.jar
Conclusion#
For most engineers, the practical takeaways are: use available not free to assess memory pressure, OOM kills mean you need more memory or a lower limit, swappiness=1 on database servers, and set explicit JVM heap flags in containers. vmstat, free, and /proc/meminfo give you the full picture.