Python GIL: What It Is and When It Actually Matters

The Global Interpreter Lock (GIL) is a mutex in CPython that prevents multiple threads from executing Python bytecode simultaneously. It is frequently blamed for Python's performance limitations, but

Introduction#

The Global Interpreter Lock (GIL) is a mutex in CPython that prevents multiple threads from executing Python bytecode simultaneously. It is frequently blamed for Python’s performance limitations, but its impact is often misunderstood. This post explains what the GIL is, when it matters, and how to work around it.

What the GIL Does#

The GIL ensures only one thread executes Python bytecode at a time, even on multi-core machines. It exists to protect CPython’s memory management (reference counting) from race conditions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import threading

counter = 0

def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1

# With the GIL, this is safe from data corruption
# (though the result is still undefined due to context switches)
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start(); t2.start()
t1.join(); t2.join()
print(counter)  # ~1,000,000-2,000,000, not consistent

The GIL is released periodically (every 100 bytecode instructions, or sys.getswitchinterval=0.005s) to allow other threads to run. It is also released during I/O operations and calls to C extensions that explicitly release it.

When the GIL Matters (and When It Doesn’t)#

CPU-bound: GIL is a problem#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import threading
import time

def cpu_work(n):
    return sum(i * i for i in range(n))

# 2 threads — barely faster than 1 thread on CPU-bound work
# The GIL prevents true parallelism
start = time.time()
t1 = threading.Thread(target=cpu_work, args=(10_000_000,))
t2 = threading.Thread(target=cpu_work, args=(10_000_000,))
t1.start(); t2.start()
t1.join(); t2.join()
print(f"2 threads: {time.time() - start:.2f}s")

# Same time as sequential execution
start = time.time()
cpu_work(10_000_000)
cpu_work(10_000_000)
print(f"Sequential: {time.time() - start:.2f}s")

I/O-bound: GIL is NOT a problem#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import threading
import urllib.request
import time

urls = ["https://httpbin.org/delay/1"] * 5

def fetch(url):
    urllib.request.urlopen(url)

# Sequential: ~5 seconds
start = time.time()
for url in urls:
    fetch(url)
print(f"Sequential: {time.time() - start:.2f}s")

# Threaded: ~1 second — GIL is released during I/O
start = time.time()
threads = [threading.Thread(target=fetch, args=(url,)) for url in urls]
for t in threads: t.start()
for t in threads: t.join()
print(f"Threaded: {time.time() - start:.2f}s")

During network I/O, file I/O, and time.sleep, the GIL is released. Multiple threads can overlap their I/O waits, providing real concurrency.

Solutions for CPU-Bound Work#

multiprocessing: True Parallelism#

Each process has its own GIL. multiprocessing bypasses the GIL entirely.

1
2
3
4
5
6
7
8
9
10
11
from multiprocessing import Pool
import time

def cpu_work(n):
    return sum(i * i for i in range(n))

with Pool(processes=4) as pool:
    start = time.time()
    results = pool.map(cpu_work, [10_000_000] * 4)
    print(f"4 processes: {time.time() - start:.2f}s")
    # ~4x faster than single-threaded

concurrent.futures: Unified Interface#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import time

def work(n):
    return sum(i * i for i in range(n))

# I/O-bound → ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=10) as executor:
    futures = [executor.submit(fetch_url, url) for url in urls]
    results = [f.result() for f in futures]

# CPU-bound → ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(work, 10_000_000) for _ in range(4)]
    results = [f.result() for f in futures]

NumPy/SciPy: C Extensions Release the GIL#

NumPy operations release the GIL during computation. Multiple threads can run NumPy operations in parallel.

1
2
3
4
5
6
7
8
9
10
11
12
import numpy as np
import threading

def matrix_multiply():
    a = np.random.rand(1000, 1000)
    b = np.random.rand(1000, 1000)
    np.dot(a, b)  # releases GIL, runs in C, can parallelize

# This does run in parallel
threads = [threading.Thread(target=matrix_multiply) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()

Python 3.13: Per-Interpreter GIL (No-GIL Build)#

Python 3.13 introduced an experimental no-GIL build (PEP 703). It enables true thread-level parallelism for CPU-bound code.

1
2
3
4
5
# Build Python without GIL (experimental as of 3.13)
PYTHON_GIL=0 python3.13t script.py

# Check if GIL is disabled
python3.13t -c "import sys; print(sys._is_gil_enabled())"

Production adoption will take time as the ecosystem (C extensions, libraries) adapts.

Practical Decision Tree#

1
2
3
4
5
6
Is your workload I/O-bound?
├── Yes → Use asyncio (single thread, event loop) or ThreadPoolExecutor
└── No (CPU-bound)?
    ├── Pure Python computation → ProcessPoolExecutor or multiprocessing
    ├── NumPy/SciPy operations → Threading is fine (GIL released)
    └── Need true Python thread parallelism → Cython, C extension, or no-GIL Python 3.13

Conclusion#

The GIL matters for CPU-bound Python code. For I/O-bound workloads (most web services), threading and asyncio provide real concurrency because the GIL is released during I/O. For CPU-bound work, use multiprocessing or C extensions that release the GIL. Do not default to threads for CPU-bound tasks and blame the GIL when it does not help.

Contents