TCP Congestion Control: How Slow Start Works

TCP congestion control prevents senders from overwhelming the network. It is the reason a fresh TCP connection starts slowly and why a connection over a lossy link performs poorly. Understanding it he

Introduction#

TCP congestion control prevents senders from overwhelming the network. It is the reason a fresh TCP connection starts slowly and why a connection over a lossy link performs poorly. Understanding it helps you tune network timeouts, choose the right congestion control algorithm, and diagnose slow transfers.

The Congestion Window#

TCP limits how much data can be in-flight (sent but not yet acknowledged) using two windows:

  • rwnd (receiver window): how much the receiver can buffer (flow control)
  • cwnd (congestion window): how much the sender estimates the network can handle

The effective in-flight limit is min(rwnd, cwnd).

1
2
3
4
5
6
7
Throughput ≈ cwnd / RTT

For 100ms RTT and cwnd = 1MB:
Throughput ≈ 1MB / 0.1s = 10MB/s

To achieve 100MB/s over a 100ms link:
Required cwnd = 100MB/s × 0.1s = 10MB

Slow Start#

When a connection begins, cwnd starts at 1 segment (10 segments in modern implementations). For each ACK received, cwnd increases by 1 MSS (maximum segment size). This doubles cwnd each round-trip.

1
2
3
4
RTT 1: send 1 segment   → receive 1 ACK  → cwnd = 2
RTT 2: send 2 segments  → receive 2 ACKs → cwnd = 4
RTT 3: send 4 segments  → receive 4 ACKs → cwnd = 8
...until cwnd reaches ssthresh (slow start threshold)

Slow start is exponential growth — the name is misleading. It exits slow start when:

  1. cwnd reaches ssthresh (transitions to congestion avoidance)
  2. A packet loss is detected

Congestion Avoidance#

After ssthresh, cwnd grows linearly: +1 MSS per round-trip.

1
2
3
4
cwnd = 20 MSS (at ssthresh)
RTT 1: cwnd = 21
RTT 2: cwnd = 22
...

Packet Loss and Reaction#

When a packet is lost (timeout or 3 duplicate ACKs):

Timeout (severe): ssthresh = cwnd/2, reset cwnd = 1. Slow start again.

3 duplicate ACKs (fast retransmit): ssthresh = cwnd/2, cwnd = ssthresh. No slow start, continue with congestion avoidance. Less disruptive.

1
2
3
cwnd = 32 MSS, loss detected:
ssthresh = 16 MSS
cwnd = 16 MSS (fast recovery) or 1 MSS (timeout)

Modern Congestion Control Algorithms#

CUBIC (default on Linux)#

1
2
3
4
5
6
# Check current algorithm
sysctl net.ipv4.tcp_congestion_control
# net.ipv4.tcp_congestion_control = cubic

# CUBIC grows as a cubic function of time since last loss
# Better than Reno/TAHOE for high-bandwidth, high-latency links

BBR (Bottleneck Bandwidth and Round-trip propagation time)#

BBR (developed by Google) models the network rather than reacting to loss. It maintains a model of the bottleneck bandwidth and minimum RTT, and keeps the network pipe full without filling buffers.

1
2
3
4
5
6
7
# Enable BBR
sysctl -w net.ipv4.tcp_congestion_control=bbr
sysctl -w net.core.default_qdisc=fq  # required for BBR

# Persist
echo "net.ipv4.tcp_congestion_control=bbr" >> /etc/sysctl.conf
echo "net.core.default_qdisc=fq" >> /etc/sysctl.conf

BBR provides significantly better throughput on high-latency links and in the presence of shallow buffers (e.g., datacenter switches).

Diagnosing Connection Issues#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# View per-socket TCP statistics including cwnd
ss -tin dst :443
# State    Recv-Q  Send-Q  Local Address:Port  Peer Address:Port
# ESTAB    0       0       10.0.0.1:54321      93.184.216.34:443
# cubic rto:204 rtt:22.234/11.117 cwnd:10 ssthresh:14 bytes_sent:1234
#                                 ^^^^              ^^^^^^^^
#                                 congestion window  slow start threshold

# High rto (retransmission timeout): network is lossy
# cwnd stuck low: repeated loss events limiting throughput
# ssthresh very low: recent severe congestion event

# Measure connection throughput
iperf3 -c server -t 30 -P 4  # 4 parallel streams, 30 second test

Impact on Application Design#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# HTTP connection reuse avoids slow start overhead
# Each new connection starts at cwnd=10 segments
# An established connection has already ramped up

import httpx

# BAD: new connection per request (slow start every time)
async def fetch_price(product_id: int) -> float:
    async with httpx.AsyncClient() as client:  # new connection each call
        resp = await client.get(f"https://api.example.com/price/{product_id}")
        return resp.json()["price"]

# GOOD: reuse connection (one slow start, then full speed)
_client = httpx.AsyncClient(
    limits=httpx.Limits(max_connections=20, max_keepalive_connections=10)
)

async def fetch_price(product_id: int) -> float:
    resp = await _client.get(f"https://api.example.com/price/{product_id}")
    return resp.json()["price"]

TCP Initial Congestion Window#

Google’s research showed increasing the initial cwnd from 3 to 10 segments improves page load time by 10% by reducing slow start time for short connections.

1
2
3
4
5
6
# Check initial cwnd on a route
ip route show | grep default
# default via 10.0.0.1 dev eth0 proto dhcp initcwnd 10

# Set initial cwnd on the default route
ip route change default via 10.0.0.1 initcwnd 10

Modern Linux kernels default to initcwnd=10.

Conclusion#

Slow start protects the network but causes poor initial throughput for short-lived connections. HTTP keep-alive and connection pooling amortize slow start cost across requests. BBR outperforms CUBIC on high-latency or lossy links and is worth enabling on servers that make cross-region connections. Use ss -tin to inspect per-connection congestion state during performance investigations.

Contents