Introduction#
Redis Cluster provides automatic data sharding across multiple Redis nodes with built-in failover. It uses a hash slot model — 16384 slots distributed across primary nodes — and supports read replicas for each primary. Understanding slot distribution, client behavior, and failure modes is essential for running Redis Cluster reliably in production.
Hash Slots and Sharding#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| Redis Cluster uses 16,384 hash slots.
Each key maps to a slot: HASH_SLOT = CRC16(key) % 16384
3 primary nodes (typical minimal cluster):
Node A: slots 0–5460
Node B: slots 5461–10922
Node C: slots 10923–16383
Adding a 4th node: redistribute some slots from each node to the new node
Removing a node: migrate its slots to remaining nodes before shutdown
Hash tags: force related keys to the same slot
{user:123}:profile → hash of "user:123"
{user:123}:cart → hash of "user:123" (same slot, same node)
Enables multi-key operations on related data
|
Cluster Setup (Docker Compose)#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
| # docker-compose.yml for local Redis Cluster (6 nodes: 3 primary + 3 replica)
version: "3.8"
services:
redis-1:
image: redis:7-alpine
command: redis-server --port 7001 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes
ports: ["7001:7001"]
networks: [redis-cluster]
redis-2:
image: redis:7-alpine
command: redis-server --port 7002 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes
ports: ["7002:7002"]
networks: [redis-cluster]
redis-3:
image: redis:7-alpine
command: redis-server --port 7003 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes
ports: ["7003:7003"]
networks: [redis-cluster]
redis-4:
image: redis:7-alpine
command: redis-server --port 7004 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes
networks: [redis-cluster]
redis-5:
image: redis:7-alpine
command: redis-server --port 7005 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes
networks: [redis-cluster]
redis-6:
image: redis:7-alpine
command: redis-server --port 7006 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes
networks: [redis-cluster]
cluster-init:
image: redis:7-alpine
depends_on: [redis-1, redis-2, redis-3, redis-4, redis-5, redis-6]
command: >
sh -c "sleep 3 &&
redis-cli --cluster create
redis-1:7001 redis-2:7002 redis-3:7003
redis-4:7004 redis-5:7005 redis-6:7006
--cluster-replicas 1 --cluster-yes"
networks: [redis-cluster]
networks:
redis-cluster:
driver: bridge
|
Python Client#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| from redis.cluster import RedisCluster, ClusterNode
import json
# Connect to cluster (client discovers all nodes automatically)
startup_nodes = [
ClusterNode("localhost", 7001),
ClusterNode("localhost", 7002),
ClusterNode("localhost", 7003),
]
rc = RedisCluster(
startup_nodes=startup_nodes,
decode_responses=True,
skip_full_coverage_check=True,
retry_on_timeout=True,
socket_timeout=1.0,
socket_connect_timeout=1.0,
)
# Basic operations — work identically to single-node Redis
rc.set("user:123:name", "Alice", ex=3600)
name = rc.get("user:123:name")
print(name) # "Alice"
# Hash tags: keep related keys on same node
rc.hset("{user:123}:profile", mapping={"name": "Alice", "tier": "gold"})
rc.sadd("{user:123}:tags", "premium", "early-adopter")
rc.expire("{user:123}:profile", 3600)
rc.expire("{user:123}:tags", 3600)
# Multi-key operations require hash tags to work in cluster mode
keys = ["{user:123}:profile", "{user:123}:tags"]
rc.delete(*keys) # works because both are on the same slot
|
Pipeline in Cluster Mode#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| # In cluster mode, pipelines are per-slot (not global)
# redis-py handles this automatically but performance differs from single-node
def cache_user_batch(users: list[dict]) -> None:
"""Batch-set user profiles with cluster-aware pipelining."""
pipeline = rc.pipeline(transaction=False)
for user in users:
# Use hash tags to ensure all a user's keys land on the same node
key = f"{{user:{user['id']}}}:profile"
pipeline.hset(key, mapping={
"name": user["name"],
"email": user["email"],
"tier": user["tier"],
})
pipeline.expire(key, 3600)
pipeline.execute()
|
Lua Scripts (Atomic Multi-Key Operations)#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| # Lua scripts run atomically on a single node
# All keys in a script must be on the same slot (use hash tags)
RATE_LIMIT_SCRIPT = """
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count >= limit then
return 0
end
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return 1
"""
rate_limit_fn = rc.register_script(RATE_LIMIT_SCRIPT)
def is_rate_limited(user_id: str, limit: int = 100, window: int = 60) -> bool:
key = f"{{ratelimit:{user_id}}}:sliding" # hash tag for cluster routing
import time
result = rate_limit_fn(keys=[key], args=[limit, window, time.time()])
return result == 0
|
Failover Behavior#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| import time
from redis.exceptions import ClusterDownError, ConnectionError
def resilient_get(key: str, default=None, max_retries: int = 3):
"""Get with retry on transient cluster failures."""
for attempt in range(max_retries):
try:
return rc.get(key)
except (ClusterDownError, ConnectionError) as e:
if attempt == max_retries - 1:
# Log and return default rather than failing the request
logger.error("Redis cluster unavailable: %s", e)
return default
time.sleep(0.1 * (2 ** attempt)) # exponential backoff
return default
# Cluster failover timeline:
# T=0: Primary node fails
# T=0 to cluster-node-timeout: Replicas detect failure (default 15s)
# T=cluster-node-timeout: Election begins
# T+few seconds: Replica promoted to primary
# T+complete: Cluster operational, reads/writes resume
#
# During failover: writes to failed shard return errors
# Set cluster-node-timeout=5000ms for faster failover
|
Monitoring Cluster Health#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| def cluster_info(rc: RedisCluster) -> dict:
"""Get cluster health summary."""
info = rc.cluster_info()
return {
"state": info["cluster_state"], # "ok" or "fail"
"slots_assigned": info["cluster_slots_assigned"], # should be 16384
"known_nodes": info["cluster_known_nodes"],
"size": info["cluster_size"],
"stats_messages_sent": info["total_cluster_links_buffer_limit_exceeded"],
}
def node_health(rc: RedisCluster) -> list[dict]:
"""Get per-node memory and replication info."""
nodes = []
for node in rc.get_nodes():
info = rc.info(target_nodes=node)
nodes.append({
"host": f"{node.host}:{node.port}",
"role": info.get("role"),
"memory_mb": info.get("used_memory_human"),
"connected_clients": info.get("connected_clients"),
"ops_per_sec": info.get("instantaneous_ops_per_sec"),
})
return nodes
|
Slot Migration (Adding a Node)#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| # Add a new node to the cluster
redis-cli --cluster add-node new-node:7007 existing-node:7001
# Rebalance slots across all nodes
redis-cli --cluster rebalance existing-node:7001 --cluster-use-empty-masters
# Or manually migrate specific slots
redis-cli --cluster reshard existing-node:7001 \
--cluster-from <source-node-id> \
--cluster-to <target-node-id> \
--cluster-slots 1000 \
--cluster-yes
# Verify cluster state
redis-cli --cluster check existing-node:7001
|
Conclusion#
Redis Cluster distributes data across 16,384 hash slots with automatic failover. Hash tags are the critical tool for co-locating related keys on the same node — necessary for multi-key operations and Lua scripts. Client libraries handle slot discovery and MOVED/ASK redirections transparently. The main operational difference from standalone Redis is that cross-slot operations are not supported, which requires design discipline in key naming. For most production deployments, consider Redis Sentinel for high availability without sharding, or Redis Cluster only when a single node’s memory or throughput is genuinely insufficient.