Golden Signals Explained (With Real Metrics)
Introduction
The golden signals are a compact, battle-tested set of metrics that describe user experience and system health. They are especially effective because they are outcome-focused and map cleanly to service-level objectives. Advanced teams use them as the first layer of telemetry, then pivot into detailed traces and logs when an alert fires.
The Four Golden Signals
Each signal captures a different failure mode. Together they form a balanced view of availability and capacity.
- Latency: Distribution of request duration, including tail latency (p95/p99). Median-only metrics hide queuing and downstream degradation.
- Traffic: The demand on the service, usually requests per second or messages per second. This is needed for error-rate normalization.
- Errors: Both explicit failures (5xx, exceptions) and implicit failures (timeouts, missing data).
- Saturation: How close the service is to its limits, such as CPU, memory, thread pool depth, or queue backlog.
Mapping Signals to Real Metrics
Use measurable metrics that are already part of your production pipeline. Avoid overly derived or aggregated views for alerting.
| Signal | Example Metrics | Why It Matters |
|---|---|---|
| Latency | http_request_duration_seconds{quantile="0.99"} | Captures user-visible delays and queueing. |
| Traffic | http_requests_total | Provides demand baseline and error normalization. |
| Errors | http_request_errors_total, grpc_server_handled_total{code!="OK"} | Tracks correctness and dependency health. |
| Saturation | process_cpu_seconds_total, work_queue_depth | Indicates approaching bottlenecks. |
Instrumenting a Python Service
The Python example below emits latency, errors, and saturation-ready gauges using prometheus_client. The key is to label by route and status, not by user identifiers.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from prometheus_client import Histogram, Counter, Gauge
REQUEST_LATENCY = Histogram(
"http_request_duration_seconds",
"Latency by route and status",
["route", "status"]
)
REQUEST_ERRORS = Counter(
"http_request_errors_total",
"Error responses",
["route", "status"]
)
IN_FLIGHT = Gauge("http_requests_in_flight", "Concurrent requests")
QUEUE_DEPTH = Gauge("work_queue_depth", "Work queue backlog")
def record_request(route: str, status: int, duration: float) -> None:
REQUEST_LATENCY.labels(route=route, status=str(status)).observe(duration)
if status >= 500:
REQUEST_ERRORS.labels(route=route, status=str(status)).inc()
For saturation, pair QUEUE_DEPTH with infrastructure-level metrics (CPU, memory, and thread pool utilization) to detect capacity exhaustion before errors spike.
Alerting with Error Budgets
Instead of alerting on raw error rate, align alerts to your SLO. For a 99.9% availability target, a 30-day error budget is 0.1%. Multi-window burn-rate alerts can detect fast and slow budget consumption while avoiding noise.
Common Pitfalls
- Alerting on average latency instead of percentiles.
- Using high-cardinality labels such as user IDs or request IDs.
- Ignoring saturation metrics until error rates explode.
- Alerting on traffic spikes without correlating latency and errors.
Conclusion
Golden signals are small but powerful. When they are instrumented with careful labels and aligned to SLOs, they provide fast detection, predictable alerting, and a consistent path to root-cause analysis.