Capacity Planning in Modern Systems
Introduction
Capacity planning is the discipline of matching infrastructure to workload while preserving latency and availability targets. In modern systems, static provisioning is too slow, so planning focuses on predictive models and guardrails for autoscaling.
Core Inputs for Capacity Models
Effective plans combine historical telemetry with business forecasts.
- Demand: Requests per second, message volume, or batch size.
- Resource usage: CPU, memory, disk I/O, and network throughput.
- Latency goals: p95 and p99 targets from SLOs.
- Headroom: The buffer required for failover and traffic spikes.
Applying Queueing Theory
Little’s Law gives a reliable baseline: L = λ * W, where L is concurrency, λ is throughput, and W is latency. Use it to translate throughput targets into concurrency and connection pool sizing.
Python Example: Instance Forecast
This example estimates required instances based on CPU utilization and target headroom. It assumes per-instance capacity from load tests.
1
2
3
4
5
6
7
8
9
10
def required_instances(rps: float, rps_per_instance: float, headroom: float) -> int:
effective_capacity = rps_per_instance * (1.0 - headroom)
return int((rps / effective_capacity) + 1)
current_rps = 8200
capacity_per_instance = 600
headroom = 0.25
instances = required_instances(current_rps, capacity_per_instance, headroom)
Validating with Load Tests
Load tests should reflect production traffic mix. Use replayed traffic or production-derived distributions, and validate that error rates remain below SLO thresholds at target capacity.
Conclusion
Capacity planning is not a one-time exercise. It is an ongoing process that combines telemetry, predictive modeling, and validation to ensure reliability at scale.