Key Question
How does a Redis Cluster node know which node owns a given key, and what happens when the cluster topology changes?
Deep Dive
The redis-cluster.ts file implements a simplified Redis Cluster. The core data structure is straightforward: an array of 16384 entries, each pointing to the master node responsible for that slot.
The Routing Table
class RedisCluster {
slotTable: (RedisClusterNode | null)[] = new Array(16384).fill(null)
getNodeForKey(key: string): RedisClusterNode | null {
const slot = hashSlot(key)
return this.slotTable[slot]
}
}
When a client sends a GET or SET, the cluster computes hashSlot(key) and looks up the slot table. If the client is connected to the wrong node (because it has a stale routing table), that node returns a MOVED error with the correct node’s address.
Assigning Slots to Nodes
Slots are assigned in contiguous ranges. In the simulation, we divide 16384 by 3:
const slotsPerNode = Math.floor(16384 / 3)
cluster.assignSlots(m1, Array.from({ length: slotsPerNode }, (_, i) => i))
cluster.assignSlots(m2, Array.from({ length: slotsPerNode }, (_, i) => i + slotsPerNode))
In production, slot assignment is managed by CLUSTER ADDSLOTS or CLUSTER SETSLOT, and there is no automatic rebalancing — an operator must run redis-cli --cluster rebalance.
The Replication Gap
The simulation reveals Redis’s most consequential trade-off:
// After M1 fails and R1 is promoted:
// Keys that were on M1 but not yet replicated to R1 are LOST
The simulation output shows KEY MISSING for keys that were on the failed master. This demonstrates the cost of asynchronous replication: durability is sacrificed for single-digit millisecond write latency.
The Cluster Bus
In production, Redis Cluster nodes communicate over a separate cluster bus (port = service port + 10000). They gossip about:
- Node health (ping/pong)
- Slot ownership changes
- Configuration epochs (a Lamport-style counter)
- Failover votes
The cluster bus uses a binary protocol, not Redis’s text protocol, for efficiency.
Key Takeaways
- The routing table is a simple array —
hash(key) mod 16384→ direct lookup. MOVEDredirects let clients eventually converge on the correct topology.- Replication is asynchronous; failover loses un-replicated writes.
- Slot migration is manual and operator-driven.
Full Source
View or download the complete implementation: redis-cluster.ts
Exercises
- Run the simulation. Change
slotsPerNodeto distribute slots unevenly (e.g., 8000, 4000, 4384). What happens to key distribution? - Why does the simulation show
KEY MISSINGafter failover? - Add a
REPLICATEcommand to the simulation that synchronizes data from master to replica after write operations. What trade-off does this introduce?
👁️ View Solutions
- The node with 8000 slots stores nearly 50% of all keys, becoming a bottleneck. Uneven slot distribution causes hot spots. This is why Redis Cluster operators monitor slot distribution and rebalance regularly.
- The replica (
R1) was promoted to master but never received the last writes thatM1processed. Since replication is asynchronous (the write is sent to the replica but not awaited), the window between “client gets ACK” and “replica has the data” is where data loss occurs. - Synchronous replication (wait for replica ACK) adds latency equal to one network round trip per write. This is the trade-off Redis makes: fast writes with potential data loss vs. slower writes with guaranteed durability. Redis calls this
WAIT— it exists but is opt-in.
✏️ Exercises
Redis Cluster — Exercises
Exercise 1
You have a 6-node Redis Cluster (3 masters, 3 replicas). A master node handling slots 5461–10922 crashes. Its replica is promoted. What keys are lost?
Exercise 2
Given the key set: user:100, {shard:a}:profile, order:{2024}:001, session:abc, compute the hash slot for each using CRC16 mod 16384. Which keys would land on the same node if you use hash tags?
Exercise 3
Redis Cluster’s gossip protocol requires each node to maintain its own epoch (configuration version). When a new node joins, its epoch is 0. The cluster leader assigns it a higher epoch. Why is this epoch necessary? (Hint: think about stale routing tables.)
Exercise 4
Compare the slot bitmap approach (Redis) with the token ring approach (Cassandra). Which requires more metadata to route a request? Which handles node addition more smoothly?
👁️ View Solutions
-
Only the keys that were written to the crashed master but not yet replicated to the replica. Since replication is asynchronous, the window is typically 1–100 milliseconds of writes. The keys on the surviving masters are unaffected.
-
Hash slot computation:
CRC16("user:100") mod 16384 = slot XCRC16("shard:a") mod 16384 = slot Y(only “shard:a” due to{})CRC16("2024") mod 16384 = slot Z(only “2024” due to{})CRC16("session:abc") mod 16384 = slot W{shard:a}:profileand{shard:a}:namewould share the same slot.order:{2024}:001andorder:{2024}:002would share a different slot.
-
The epoch prevents stale routing information from overwriting fresh data. If a partitioned node wakes up with an old epoch and broadcasts its slot map, nodes with a higher epoch know to ignore it. This is functionally identical to Raft’s term number.
-
Redis uses a fixed-size 16384-entry array — O(1) lookup, 2 KB per node. Cassandra’s token ring uses a sorted list of tokens with binary search — O(log N) lookup. Redis is simpler at the cost of manual rebalancing; Cassandra is more complex but handles additions and removals automatically.