Introduction#
Kubernetes uses three probe types to manage pod lifecycle: liveness (is the container healthy?), readiness (is the container ready to serve traffic?), and startup (has the container finished initializing?). Misconfigured probes cause unnecessary restarts, traffic to unready pods, and slow rolling deployments.
Probe Types#
Liveness Probe#
Determines if the container is running correctly. Failure triggers a container restart.
Use case: detect deadlocks, memory leaks that make the app unresponsive, or infinite loops that cannot be detected from outside.
1
2
3
4
5
6
7
8
9
| livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10 # wait before first probe
periodSeconds: 15 # probe every 15s
timeoutSeconds: 5
failureThreshold: 3 # restart after 3 consecutive failures
successThreshold: 1
|
Readiness Probe#
Determines if the container is ready to serve traffic. Failure removes the pod from Service endpoints (no traffic routed to it).
Use case: warm-up time after restart, temporarily overloaded pods, maintenance mode, dependency unavailability.
1
2
3
4
5
6
7
8
| readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
successThreshold: 2 # require 2 consecutive successes to mark ready again
|
Startup Probe#
Handles slow-starting containers. Disables liveness and readiness probes until it succeeds. Prevents premature liveness failures during long initialization.
1
2
3
4
5
6
| startupProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 30 # allow up to 30 * 10s = 5 minutes to start
|
Implementing Health Endpoints#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
| # FastAPI: health endpoints that do meaningful checks
from fastapi import FastAPI, Response
import asyncpg
import redis.asyncio as aioredis
app = FastAPI()
db_pool = None
redis_client = None
@app.get("/healthz")
async def liveness():
# Liveness: check that the app is not deadlocked
# Keep this SIMPLE — avoid external dependency checks
return {"status": "alive"}
@app.get("/ready")
async def readiness(response: Response):
checks = {}
# Check database connectivity
try:
await db_pool.fetchval("SELECT 1")
checks["database"] = "ok"
except Exception as e:
checks["database"] = f"error: {e}"
response.status_code = 503
# Check Redis connectivity
try:
await redis_client.ping()
checks["redis"] = "ok"
except Exception as e:
checks["redis"] = f"error: {e}"
response.status_code = 503
return checks
|
Go: Health Endpoint#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| package main
import (
"context"
"encoding/json"
"net/http"
"time"
)
var db *sql.DB
func livenessHandler(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(map[string]string{"status": "alive"})
}
func readinessHandler(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
if err := db.PingContext(ctx); err != nil {
w.WriteHeader(http.StatusServiceUnavailable)
json.NewEncoder(w).Encode(map[string]string{
"status": "not ready",
"error": err.Error(),
})
return
}
w.WriteHeader(http.StatusOK)
json.NewEncoder(w).Encode(map[string]string{"status": "ready"})
}
|
Common Mistakes#
Liveness Probe Checking External Dependencies#
1
2
3
4
5
6
7
8
9
10
11
12
| # BAD: if the database is slow, the pod restarts unnecessarily
# A database outage would cause all pods to restart in a loop
livenessProbe:
httpGet:
path: /health-with-db-check
port: 8080
# GOOD: liveness checks only that the app process is alive
livenessProbe:
httpGet:
path: /healthz # returns 200 always unless the app is hung
port: 8080
|
Too-Aggressive Liveness Settings#
1
2
3
4
5
6
7
8
9
10
11
| # BAD: will restart pods during brief GC pauses or CPU spikes
livenessProbe:
periodSeconds: 5
timeoutSeconds: 1
failureThreshold: 1
# BETTER: tolerant of transient slowness
livenessProbe:
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3 # allows 30 seconds of failure before restart
|
Missing Startup Probe for Slow Services#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| # BAD: liveness probe fires before JVM finishes starting
# initialDelaySeconds is a fixed wait — wrong for variable startup times
livenessProbe:
initialDelaySeconds: 60 # hardcoded guess
httpGet:
path: /healthz
port: 8080
# GOOD: startup probe handles variable initialization time
startupProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 5
failureThreshold: 60 # allow up to 5 minutes
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 3
# No initialDelaySeconds needed — startup probe gates liveness
|
Full Configuration Example#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
template:
spec:
containers:
- name: api
image: my-api:latest
ports:
- containerPort: 8080
startupProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 5
failureThreshold: 24 # 2 minutes max startup
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
|
Conclusion#
Keep liveness probes simple and internal-only — they should detect the app being hung, not dependency failures. Use readiness probes to check real dependencies and gate traffic. Use startup probes for slow-starting services instead of initialDelaySeconds. The most common production issues come from liveness probes that are too aggressive or check external services, causing restart loops during dependency outages.