Kubernetes Resource Requests and Limits: Getting Them Right

Introduction#

Resource requests and limits are among the most misunderstood Kubernetes settings. Wrong values lead to OOM kills, CPU throttling, poor scheduling decisions, and wasted money. This post explains the semantics and provides a practical approach to setting values correctly.

Requests vs Limits#

Request: the amount of resource the scheduler guarantees the pod. Used for scheduling decisions and as the basis for HPA utilization calculations.

Limit: the maximum the container is allowed to consume.

resources:
  requests:
    cpu: 250m       # scheduler reserves 0.25 CPU on the node
    memory: 256Mi   # scheduler reserves 256MB
  limits:
    cpu: 1000m      # container throttled above 1 CPU (never killed)
    memory: 512Mi   # container killed (OOMKilled) above 512MB

CPU is compressible: exceeding the CPU limit causes throttling, not termination. The container continues running but gets less CPU time.

Memory is not compressible: exceeding the memory limit causes the kernel OOM killer to terminate a process in the container (OOMKilled).

Quality of Service Classes#

Kubernetes assigns a QoS class based on requests/limits, which determines eviction priority.

QoS Class	Condition	Eviction Priority
`Guaranteed`	requests == limits for all containers	Last evicted
`Burstable`	requests < limits or partial settings	Middle
`BestEffort`	no requests or limits set	First evicted

kubectl get pod my-pod -o jsonpath='{.status.qosClass}'

For critical services: use Guaranteed QoS (set requests == limits). For batch jobs: Burstable is fine.

CPU Throttling Problem#

Setting CPU limits too low causes throttling even when the node has spare CPU capacity. This is a common source of latency spikes.

# Check CPU throttling rate
kubectl exec -it my-pod -- cat /sys/fs/cgroup/cpu/cpu.stat
# nr_throttled: 5234   ← number of scheduling periods where container was throttled
# throttled_time: 12345678  ← nanoseconds throttled

# Prometheus metric (via cAdvisor)
rate(container_cpu_cfs_throttled_periods_total[5m]) /
rate(container_cpu_cfs_periods_total[5m])
# > 25% throttling indicates limit is too low

For latency-sensitive services: consider removing CPU limits entirely (only set requests) on clusters with node-level CPU isolation. Accept the risk of noisy neighbors rather than guaranteed throttling.

Right-Sizing with VPA Recommendations#

# Install VPA and wait 24-48 hours for usage data
kubectl describe vpa my-app-vpa
# Lower Bound: cpu: 80m   memory: 120Mi
# Target:      cpu: 200m  memory: 290Mi
# Upper Bound: cpu: 450m  memory: 680Mi

# Set requests to VPA target, limits to ~2x target
# requests:
#   cpu: 200m
#   memory: 300Mi
# limits:
#   cpu: 800m
#   memory: 600Mi

LimitRange: Namespace Defaults#

Apply default requests/limits to all containers that don’t specify them.

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: 500m
      memory: 256Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: 4
      memory: 4Gi
    min:
      cpu: 50m
      memory: 64Mi

Without a LimitRange, pods without requests have BestEffort QoS and will be evicted first under pressure.

ResourceQuota: Namespace Total Limits#

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"       # total CPU requests across all pods
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "100"
    services: "20"

# Check quota usage
kubectl describe resourcequota -n production

Practical Starting Values#

Without VPA data, start with these and tune based on observed usage:

Service Type	CPU Request	CPU Limit	Memory Request	Memory Limit
API server (stateless)	100m	500m	128Mi	512Mi
Background worker	50m	500m	128Mi	512Mi
Cache (Redis)	100m	1000m	256Mi	1Gi
Database sidecar	25m	100m	64Mi	128Mi

Monitor actual usage with:

kubectl top pods -n production --sort-by=memory
kubectl top nodes

Conclusion#

Set requests based on actual measured usage (use VPA recommendations). Set memory limits conservatively — OOM kills are better than unbounded memory growth. Be cautious with CPU limits; throttling causes latency and is hard to detect. Use LimitRange to set safe defaults for your namespace and ResourceQuota to prevent runaway resource consumption.