PostgreSQL Indexes Deep Dive: B-Tree, GIN, GiST, and BRIN

Introduction#

Choosing the wrong index type or missing an index entirely is among the most common causes of database performance problems. PostgreSQL provides multiple index types, each optimized for different query patterns. This post covers the four most important types and when to use each.

B-Tree: The Default#

B-Tree is the default index type. It handles equality, range, and ordering queries efficiently.

-- Default: creates a B-Tree index
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Supports: =, <, <=, >, >=, BETWEEN, IN, IS NULL, LIKE 'prefix%'
-- Does NOT efficiently support: LIKE '%suffix', full-text search, arrays

-- Compound B-Tree index
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Efficient for: WHERE user_id = ? AND status = ?
-- Also efficient for: WHERE user_id = ?  (leading column)
-- NOT efficient for: WHERE status = ?    (non-leading column)

-- Partial index: index only a subset of rows
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
-- Much smaller index, faster for queries that only look at pending orders

Explaining Index Usage#

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE user_id = 42 AND created_at > NOW() - INTERVAL '30 days';

-- Index Scan using idx_orders_user_status on orders
--   Index Cond: ((user_id = 42) AND (created_at > ...))
--   Buffers: shared hit=15 read=3
-- Planning Time: 0.5ms
-- Execution Time: 2.1ms

-- Watch for:
-- "Seq Scan" on large tables — usually needs an index
-- "Bitmap Heap Scan" — index used but many rows fetched
-- High "Buffers: read" — pages not in cache (I/O bound)

GIN: Generalized Inverted Index#

GIN is optimized for multi-valued data types: arrays, JSONB, full-text search (tsvector).

-- Full-text search
ALTER TABLE articles ADD COLUMN search_vector tsvector;
UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || body);
CREATE INDEX idx_articles_search ON articles USING gin(search_vector);

-- Query
SELECT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'kubernetes & deployment');

-- JSONB: index for containment queries
CREATE INDEX idx_events_metadata ON events USING gin(metadata);
-- Efficient for: WHERE metadata @> '{"event_type": "login"}'
-- NOT efficient for: WHERE metadata->>'event_type' = 'login'

-- Array containment
CREATE INDEX idx_posts_tags ON posts USING gin(tags);
-- Efficient for: WHERE tags @> ARRAY['postgresql']
--               WHERE tags && ARRAY['postgresql', 'redis']

GIN indexes are large and slow to build. Use gin_pending_list_limit to control how often the pending list is flushed.

-- Faster inserts with delayed index updates
-- (trades slightly slower queries for faster writes)
CREATE INDEX idx_articles_search ON articles USING gin(search_vector)
WITH (fastupdate = on, gin_pending_list_limit = 4096);

GiST: Generalized Search Tree#

GiST supports geometric data, range types, and nearest-neighbor searches.

-- Range type: efficiently query overlapping time ranges
CREATE INDEX idx_bookings_period ON bookings USING gist(period);
-- period is tstzrange (timestamp range)

SELECT * FROM bookings
WHERE period && tstzrange('2025-03-01', '2025-03-07');  -- overlaps

-- PostGIS: geospatial queries
CREATE INDEX idx_locations_geom ON locations USING gist(coordinates);
SELECT * FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(-74.0, 40.7), 1000);  -- within 1km

-- Nearest-neighbor (KNN) search
SELECT * FROM locations
ORDER BY coordinates <-> ST_MakePoint(-74.0, 40.7)
LIMIT 10;  -- 10 closest points

BRIN: Block Range Index#

BRIN (Block Range INdex) stores min/max values per block range. Tiny size but only efficient when data is naturally sorted (time-series, sequential IDs).

-- Excellent for: append-only time-series tables sorted by timestamp
CREATE INDEX idx_metrics_recorded_at ON metrics USING brin(recorded_at)
WITH (pages_per_range = 128);

-- BRIN is tiny: ~1000x smaller than B-Tree for large tables
-- Efficient only when: physical order correlates with query filter
-- NOT efficient for: random user_id queries on a table inserted in random order

-- Check correlation (1.0 = perfectly sorted, 0 = random)
SELECT attname, correlation
FROM pg_stats
WHERE tablename = 'metrics' AND attname = 'recorded_at';
-- If correlation > 0.9: BRIN is appropriate

Index Selection Guide#

Data Type / Query Pattern	Index Type
Equality, range, ORDER BY	B-Tree
Full-text search	GIN on tsvector
JSONB containment (`@>`)	GIN
Array operations (`@>`, `&&`)	GIN
Geometric, spatial data	GiST (PostGIS)
Range type overlap	GiST
Nearest-neighbor (KNN)	GiST
Time-series (sequential data)	BRIN
Partial data (WHERE clause)	Partial B-Tree

Index Bloat#

Over time, indexes accumulate dead tuples from UPDATEs and DELETEs.

-- Check index bloat
SELECT
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild bloated index (blocks reads/writes)
REINDEX INDEX CONCURRENTLY idx_orders_created_at;

-- Check for unused indexes (costly to maintain, never used)
SELECT indexrelid::regclass AS index
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND schemaname = 'public';

Conclusion#

Use B-Tree for almost everything. Add GIN for full-text search and JSONB. Use GiST for geospatial queries and range overlaps. Use BRIN only for large append-only tables where data is physically ordered. Remove unused indexes — they slow down writes without benefiting reads. Always verify index usage with EXPLAIN ANALYZE before and after adding an index.