Post

Golden Signals Explained (With Real Metrics)

Introduction

The golden signals are a compact, battle-tested set of metrics that describe user experience and system health. They are especially effective because they are outcome-focused and map cleanly to service-level objectives. Advanced teams use them as the first layer of telemetry, then pivot into detailed traces and logs when an alert fires.

The Four Golden Signals

Each signal captures a different failure mode. Together they form a balanced view of availability and capacity.

  • Latency: Distribution of request duration, including tail latency (p95/p99). Median-only metrics hide queuing and downstream degradation.
  • Traffic: The demand on the service, usually requests per second or messages per second. This is needed for error-rate normalization.
  • Errors: Both explicit failures (5xx, exceptions) and implicit failures (timeouts, missing data).
  • Saturation: How close the service is to its limits, such as CPU, memory, thread pool depth, or queue backlog.

Mapping Signals to Real Metrics

Use measurable metrics that are already part of your production pipeline. Avoid overly derived or aggregated views for alerting.

SignalExample MetricsWhy It Matters
Latencyhttp_request_duration_seconds{quantile="0.99"}Captures user-visible delays and queueing.
Traffichttp_requests_totalProvides demand baseline and error normalization.
Errorshttp_request_errors_total, grpc_server_handled_total{code!="OK"}Tracks correctness and dependency health.
Saturationprocess_cpu_seconds_total, work_queue_depthIndicates approaching bottlenecks.

Instrumenting a Python Service

The Python example below emits latency, errors, and saturation-ready gauges using prometheus_client. The key is to label by route and status, not by user identifiers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from prometheus_client import Histogram, Counter, Gauge

REQUEST_LATENCY = Histogram(
    "http_request_duration_seconds",
    "Latency by route and status",
    ["route", "status"]
)
REQUEST_ERRORS = Counter(
    "http_request_errors_total",
    "Error responses",
    ["route", "status"]
)
IN_FLIGHT = Gauge("http_requests_in_flight", "Concurrent requests")
QUEUE_DEPTH = Gauge("work_queue_depth", "Work queue backlog")


def record_request(route: str, status: int, duration: float) -> None:
    REQUEST_LATENCY.labels(route=route, status=str(status)).observe(duration)
    if status >= 500:
        REQUEST_ERRORS.labels(route=route, status=str(status)).inc()

For saturation, pair QUEUE_DEPTH with infrastructure-level metrics (CPU, memory, and thread pool utilization) to detect capacity exhaustion before errors spike.

Alerting with Error Budgets

Instead of alerting on raw error rate, align alerts to your SLO. For a 99.9% availability target, a 30-day error budget is 0.1%. Multi-window burn-rate alerts can detect fast and slow budget consumption while avoiding noise.

Common Pitfalls

  • Alerting on average latency instead of percentiles.
  • Using high-cardinality labels such as user IDs or request IDs.
  • Ignoring saturation metrics until error rates explode.
  • Alerting on traffic spikes without correlating latency and errors.

Conclusion

Golden signals are small but powerful. When they are instrumented with careful labels and aligned to SLOs, they provide fast detection, predictable alerting, and a consistent path to root-cause analysis.

This post is licensed under CC BY 4.0 by the author.