Introduction#
Query optimization is a feedback loop between schema design, statistics, and query formulation. Advanced teams treat SQL as code: measured, profiled, and tuned based on real workloads. This guide focuses on practical techniques that produce measurable gains.
Start With Real Workload Metrics#
Collect slow query logs and build a baseline from production-like traffic. Tuning ad-hoc queries without real workload data often leads to regressions elsewhere.
Use EXPLAIN ANALYZE to Validate Assumptions#
PostgreSQL reveals the true cost of an execution plan with EXPLAIN (ANALYZE, BUFFERS).
1
2
3
4
5
6
EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.total_amount
FROM orders o
WHERE o.customer_id = 'CUST-123'
ORDER BY o.created_at DESC
LIMIT 20;
Look for sequential scans, high buffer reads, or misestimated rows.
Design Indexes for Access Paths#
Indexes should match the query predicate and ordering.
1
2
CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);
If the query includes additional filters, consider composite indexes that match the most selective prefix.
Avoid Offset Pagination for Large Tables#
Offset scans grow linearly. Use keyset pagination instead.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from datetime import datetime
import sqlalchemy as sa
orders = sa.table(
"orders",
sa.column("id"),
sa.column("customer_id"),
sa.column("created_at"),
sa.column("total_amount")
)
last_seen = datetime.utcnow()
stmt = (
sa.select(orders.c.id, orders.c.total_amount)
.where(orders.c.customer_id == "CUST-123")
.where(orders.c.created_at < last_seen)
.order_by(orders.c.created_at.desc())
.limit(20)
)
Keyset pagination keeps the query index-friendly.
Push Filters Down Early#
Avoid filtering in the application layer. SQL engines can use statistics and indexes to make better decisions.
Eliminate N+1 Query Patterns#
Batch or join instead of issuing per-row queries.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
line_items = sa.table(
"line_items",
sa.column("id"),
sa.column("order_id")
)
stmt = (
sa.select(
orders.c.id,
sa.func.count(line_items.c.id).label("line_count")
)
.select_from(orders.join(line_items, line_items.c.order_id == orders.c.id))
.where(orders.c.customer_id == "CUST-123")
.group_by(orders.c.id)
)
Use Prepared Statements and Parameterization#
Parameterization improves plan caching and reduces parsing overhead. Avoid string concatenation with dynamic values.
Watch for Misleading Statistics#
If statistics are outdated, the planner will choose inefficient plans. Schedule ANALYZE after large data changes, or configure autovacuum more aggressively for hot tables.
Conclusion#
Optimization is not a one-time task. Instrument queries, verify with EXPLAIN ANALYZE, and apply targeted index or query changes. The goal is predictable performance under realistic load, not just faster execution on a developer laptop.