Request Hedging and Retry Storms
Request Hedging and Retry Storms
Request hedging is a technique to reduce tail latency by sending a duplicate request to another replica if the first request is slow. It can improve p99 latency but risks overloading the system if not controlled.
When Request Hedging Helps
Hedging is useful when:
- You have multiple replicas with independent latency distributions.
- Tail latency is the dominant contributor to user experience.
- Requests are read-only and idempotent.
How Retry Storms Happen
Retry storms occur when:
- Timeouts are too aggressive.
- Retries are unbounded or synchronized.
- The system is already overloaded, and retries amplify the load.
Safe Hedging Practices
- Use a latency threshold (for example, p95) before hedging.
- Cap the number of hedged requests per call.
- Implement global retry budgets per client.
- Add jitter to avoid synchronized retries.
Spring Boot Example with Hedged Reads
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
@Service
public class CatalogClient {
private final WebClient webClient;
public CatalogClient(WebClient.Builder builder) {
this.webClient = builder.baseUrl("http://catalog").build();
}
public Mono<CatalogItem> getItem(String id) {
Mono<CatalogItem> primary = fetch(id);
Mono<CatalogItem> hedge = Mono.delay(Duration.ofMillis(80))
.flatMap(ignore -> fetch(id));
return Mono.firstWithSignal(primary, hedge)
.timeout(Duration.ofMillis(200));
}
private Mono<CatalogItem> fetch(String id) {
return webClient.get()
.uri("/items/{id}", id)
.retrieve()
.bodyToMono(CatalogItem.class);
}
}
Observability for Hedging
Track:
- Hedged request rate.
- Extra load induced by hedging.
- Success rate of primary vs hedged requests.
Summary
Request hedging can reduce tail latency, but it must be combined with retry budgets, jitter, and load-shedding safeguards to avoid self-inflicted retry storms.
This post is licensed under CC BY 4.0 by the author.