Capacity Planning in Modern Systems

Posted Jul 3, 2025

By R G

1 min read

Introduction

Capacity planning is the discipline of matching infrastructure to workload while preserving latency and availability targets. In modern systems, static provisioning is too slow, so planning focuses on predictive models and guardrails for autoscaling.

Core Inputs for Capacity Models

Effective plans combine historical telemetry with business forecasts.

Demand: Requests per second, message volume, or batch size.
Resource usage: CPU, memory, disk I/O, and network throughput.
Latency goals: p95 and p99 targets from SLOs.
Headroom: The buffer required for failover and traffic spikes.

Applying Queueing Theory

Little’s Law gives a reliable baseline: L = λ * W, where L is concurrency, λ is throughput, and W is latency. Use it to translate throughput targets into concurrency and connection pool sizing.

Python Example: Instance Forecast

This example estimates required instances based on CPU utilization and target headroom. It assumes per-instance capacity from load tests.

  
def required_instances(rps: float, rps_per_instance: float, headroom: float) -> int:
    effective_capacity = rps_per_instance * (1.0 - headroom)
    return int((rps / effective_capacity) + 1)


current_rps = 8200
capacity_per_instance = 600
headroom = 0.25

instances = required_instances(current_rps, capacity_per_instance, headroom)

Validating with Load Tests

Load tests should reflect production traffic mix. Use replayed traffic or production-derived distributions, and validate that error rates remain below SLO thresholds at target capacity.

Conclusion

Capacity planning is not a one-time exercise. It is an ongoing process that combines telemetry, predictive modeling, and validation to ensure reliability at scale.

DevOps

This post is licensed under CC BY 4.0 by the author.