Post

Data Partitioning vs Sharding: When to Split Your Database

Introduction

Partitioning and sharding both split data, but they solve different problems. Partitioning keeps data inside one database engine, while sharding distributes data across independent databases. Understanding the operational and query impact is critical before adopting either strategy.

Partitioning Inside a Single Database

Partitioning divides a large table into smaller physical segments managed by the same database instance. It improves manageability and query performance by pruning partitions.

Example: PostgreSQL Range Partitioning

1
2
3
4
5
6
7
8
9
10
11
12
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    customer_id TEXT NOT NULL,
    created_at TIMESTAMP NOT NULL,
    total_amount NUMERIC(12, 2) NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2026_01 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE orders_2026_02 PARTITION OF orders
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Partition pruning reduces IO when queries are constrained by created_at.

Sharding Across Independent Databases

Sharding distributes data across multiple database instances to scale writes and storage. It requires application-level routing or a proxy layer.

Example: Java Routing by Customer ID

1
2
3
4
5
6
7
8
9
10
11
12
13
public class ShardRouter {
    private final List<String> shardUrls;

    public ShardRouter(List<String> shardUrls) {
        this.shardUrls = shardUrls;
    }

    public String shardForCustomer(String customerId) {
        int hash = customerId.hashCode();
        int index = Math.floorMod(hash, shardUrls.size());
        return shardUrls.get(index);
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
public class OrderService {
    private final ShardRouter shardRouter;
    private final Map<String, DataSource> shardDataSources = new ConcurrentHashMap<>();
    private final String username;
    private final String password;

    public OrderService(ShardRouter shardRouter) {
        this.shardRouter = shardRouter;
        this.username = System.getenv("APP_DB_USERNAME");
        this.password = System.getenv("APP_DB_PASSWORD");
    }

    public DataSource dataSourceForCustomer(String customerId) {
        String url = shardRouter.shardForCustomer(customerId);
        return shardDataSources.computeIfAbsent(url, this::createDataSource);
    }

    private DataSource createDataSource(String url) {
        return DataSourceBuilder.create()
            .url(url)
            .username(username)
            .password(password)
            .build();
    }
}

Tradeoffs at Scale

Partitioning Benefits

  • Keeps transactions and joins local
  • Easier backups and indexing
  • No cross-node coordination

Sharding Benefits

  • Horizontal write scaling
  • Independent failure domains
  • Enables geo-distribution

Sharding Costs

  • Cross-shard joins require application-level aggregation
  • Operational complexity (routing, rebalancing, migrations)
  • Consistency across shards is harder to guarantee

Choosing Between Them

  • Use partitioning when a single database can handle throughput but large tables cause maintenance pain.
  • Use sharding when you need more write throughput or storage than a single database can provide.
  • Combine both when shards themselves need internal partitioning.

Conclusion

Partitioning is a database-level optimization. Sharding is an architectural decision with long-term operational consequences. Exhaust partitioning and vertical scaling before sharding unless your throughput requirements demand it.

This post is licensed under CC BY 4.0 by the author.