Post

Circuit Breaker Deep Dive (with Failure Modes)

Circuit Breaker Deep Dive (with Failure Modes)

Circuit breakers protect services from cascading failures by stopping calls to an unhealthy dependency. They convert slow failures into fast failures and give the system time to recover.

Circuit Breaker States

  • Closed: calls flow normally while metrics are collected.
  • Open: calls are short-circuited immediately.
  • Half-open: a limited number of trial calls probe recovery.

Failure Modes a Circuit Breaker Handles

  • Latency spikes: slow downstream responses exhaust thread pools.
  • Error storms: repeated 5xx responses amplify load.
  • Connection failures: timeouts and connection resets.

Configuration Tradeoffs

Key parameters:

  • Failure rate threshold.
  • Slow call threshold.
  • Wait duration in open state.
  • Number of permitted calls in half-open state.

Overly aggressive thresholds can cause flapping. Conservative thresholds can allow overload to spread.

Spring Boot + Resilience4j Example

1
2
3
4
5
6
7
8
9
10
@Bean
public CircuitBreakerConfig inventoryBreakerConfig() {
    return CircuitBreakerConfig.custom()
            .failureRateThreshold(50)
            .slowCallRateThreshold(50)
            .slowCallDurationThreshold(Duration.ofMillis(200))
            .waitDurationInOpenState(Duration.ofSeconds(10))
            .permittedNumberOfCallsInHalfOpenState(5)
            .build();
}
1
2
3
4
5
6
7
@CircuitBreaker(name = "inventory", fallbackMethod = "fallback")
public Mono<InventoryResponse> getInventory(String sku) {
    return webClient.get()
            .uri("/items/{sku}", sku)
            .retrieve()
            .bodyToMono(InventoryResponse.class);
}

Observability Practices

  • Emit breaker state transitions to metrics.
  • Correlate open states with dependency latency spikes.
  • Alert on sustained open states or excessive flapping.

Summary

Circuit breakers are essential for protecting systems from cascading failures. Properly tuned thresholds and strong telemetry are required to avoid false positives or unmitigated overload.

This post is licensed under CC BY 4.0 by the author.