OpenTelemetry Architecture Deep Dive
Introduction
OpenTelemetry (OTel) provides a unified architecture for metrics, traces, and logs. Understanding its internal layers helps advanced teams design scalable observability pipelines and avoid hidden costs.
Core Components
OpenTelemetry is composed of a few major building blocks.
- API: Stable interfaces for instrumentation.
- SDK: Implementation that handles batching, sampling, and processing.
- Collector: Vendor-neutral pipeline for receiving, processing, and exporting telemetry.
- Exporters: Output adapters to backends like Prometheus, Tempo, or Elastic.
Data Flow
The instrumentation API emits signals into the SDK, which applies processors like resource detection, attribute filtering, and batching. From there, data is exported directly to a backend or routed through the collector. The collector is preferred in production because it centralizes authentication, load shedding, and buffering.
Signal Correlation
Metrics and traces can be linked using exemplars, while logs can include trace and span identifiers. This correlation is critical when you need to jump from a high-level SLO breach to the exact trace that caused it.
Java Example: Custom Tracer Provider
The following snippet shows a manual setup for a Spring Boot service where you control sampling and resource attributes.
1
2
3
4
5
6
7
8
9
10
11
SdkTracerProvider tracerProvider = SdkTracerProvider.builder()
.setResource(Resource.getDefault().merge(
Resource.create(Attributes.of(ResourceAttributes.SERVICE_NAME, "billing-api"))
))
.setSampler(Sampler.traceIdRatioBased(0.2))
.addSpanProcessor(BatchSpanProcessor.builder(otlpSpanExporter).build())
.build();
OpenTelemetry openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider(tracerProvider)
.build();
Collector Pipelines for Production
Use the collector to apply tail-based sampling, attribute scrubbing, and rate limiting. This keeps SDKs lightweight and moves heavy processing to a central, scalable component.
Conclusion
OpenTelemetry is more than a library. It is an architecture for telemetry pipelines. Mastering its components allows you to tune reliability, cost, and data fidelity with precision.