Request Hedging and Retry Storms

Request hedging is a technique to reduce tail latency by sending a duplicate request to another replica if the first request is slow. It can improve p99 latency but risks overloading the system if not

Request Hedging and Retry Storms#

Request hedging is a technique to reduce tail latency by sending a duplicate request to another replica if the first request is slow. It can improve p99 latency but risks overloading the system if not controlled.

When Request Hedging Helps#

Hedging is useful when:

  • You have multiple replicas with independent latency distributions.
  • Tail latency is the dominant contributor to user experience.
  • Requests are read-only and idempotent.

How Retry Storms Happen#

Retry storms occur when:

  • Timeouts are too aggressive.
  • Retries are unbounded or synchronized.
  • The system is already overloaded, and retries amplify the load.

Safe Hedging Practices#

  • Use a latency threshold (for example, p95) before hedging.
  • Cap the number of hedged requests per call.
  • Implement global retry budgets per client.
  • Add jitter to avoid synchronized retries.

Spring Boot Example with Hedged Reads#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
@Service
public class CatalogClient {
    private final WebClient webClient;

    public CatalogClient(WebClient.Builder builder) {
        this.webClient = builder.baseUrl("http://catalog").build();
    }

    public Mono<CatalogItem> getItem(String id) {
        Mono<CatalogItem> primary = fetch(id);
        Mono<CatalogItem> hedge = Mono.delay(Duration.ofMillis(80))
                .flatMap(ignore -> fetch(id));

        return Mono.firstWithSignal(primary, hedge)
                .timeout(Duration.ofMillis(200));
    }

    private Mono<CatalogItem> fetch(String id) {
        return webClient.get()
                .uri("/items/{id}", id)
                .retrieve()
                .bodyToMono(CatalogItem.class);
    }
}

Observability for Hedging#

Track:

  • Hedged request rate.
  • Extra load induced by hedging.
  • Success rate of primary vs hedged requests.

Summary#

Request hedging can reduce tail latency, but it must be combined with retry budgets, jitter, and load-shedding safeguards to avoid self-inflicted retry storms.

Contents