Request Hedging and Retry Storms#
Request hedging is a technique to reduce tail latency by sending a duplicate request to another replica if the first request is slow. It can improve p99 latency but risks overloading the system if not controlled.
When Request Hedging Helps#
Hedging is useful when:
- You have multiple replicas with independent latency distributions.
- Tail latency is the dominant contributor to user experience.
- Requests are read-only and idempotent.
How Retry Storms Happen#
Retry storms occur when:
- Timeouts are too aggressive.
- Retries are unbounded or synchronized.
- The system is already overloaded, and retries amplify the load.
Safe Hedging Practices#
- Use a latency threshold (for example, p95) before hedging.
- Cap the number of hedged requests per call.
- Implement global retry budgets per client.
- Add jitter to avoid synchronized retries.
Spring Boot Example with Hedged Reads#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
@Service
public class CatalogClient {
private final WebClient webClient;
public CatalogClient(WebClient.Builder builder) {
this.webClient = builder.baseUrl("http://catalog").build();
}
public Mono<CatalogItem> getItem(String id) {
Mono<CatalogItem> primary = fetch(id);
Mono<CatalogItem> hedge = Mono.delay(Duration.ofMillis(80))
.flatMap(ignore -> fetch(id));
return Mono.firstWithSignal(primary, hedge)
.timeout(Duration.ofMillis(200));
}
private Mono<CatalogItem> fetch(String id) {
return webClient.get()
.uri("/items/{id}", id)
.retrieve()
.bodyToMono(CatalogItem.class);
}
}
Observability for Hedging#
Track:
- Hedged request rate.
- Extra load induced by hedging.
- Success rate of primary vs hedged requests.
Summary#
Request hedging can reduce tail latency, but it must be combined with retry budgets, jitter, and load-shedding safeguards to avoid self-inflicted retry storms.