Key Question
Which distributed mutual exclusion algorithm should you use and when?
Deep Dive
Complete Comparison
Algorithm Messages/Entry SPOF? Delay Fairness Complexity
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Centralized 3 Yes 2 msgs Depends Simple
Token Ring 1 Partial 0 to N hops Round- Moderate
(token pos) robin
Ricart-Agrawala 2(N-1) No 2(N-1) msgs FCFS Moderate
Maekawa (Voting) 3βN No 3βN msgs FCFS Complex
Lamport's Bakery 3(N-1) No 3(N-1) msgs FCFS Moderate
Lamportβs Bakery Algorithm
Lamportβs bakery algorithm is the distributed equivalent of a bakeryβs ticket dispenser:
1. Each node picks a sequence number (timestamp) by reading
others' numbers and picking max+1
2. Node with smallest number enters CS
3. If tie (same number), lower node ID wins
It uses 3(N-1) messages per entry β slightly more than Ricart-Agrawalaβs 2(N-1) β but holds historical significance as one of Lamportβs foundational contributions.
Lamport's Bakery: Entry protocol
P1: "I take number 5"
P2: "I take number 6"
P3: "I take number 4"
Order: P3(4) β P1(5) β P2(6)
The Delayed Grant Problem
Token-based algorithms have a subtle issue: if P1 sends the token to P2, but the message is delayed, P1 might send it again thinking P2 didnβt get it. Now P2 has two tokens, violating mutual exclusion.
Token duplication scenario:
P1 ββ(token)βββ P2 (message delayed)
P1 times out
P1 ββ(token)βββ P2 (second copy)
P2 gets TWO tokens!
Solution: Use sequence numbers to detect duplicates.
What You Should Actually Use
For production systems, donβt implement these from scratch. Use battle-tested coordination services:
ZooKeeper / etcd Distributed Locks:
1. Create an ephemeral znode/key
2. The node that successfully creates it holds the lock
3. Others watch for deletion to get notified
Plus: Fencing tokens β an incrementing counter that
prevents a delayed grant from causing problems:
Client acquires lock, gets fencing token = 5
Client sends write to storage: "write X, token=5"
(message delayed by GC pause)
Lock expires, another client gets token = 6
Second client writes successfully
First client's delayed write arrives with token=5
Storage rejects it (token 5 < last seen token 6)
Fencing tokens solve the fundamental problem of distributed mutual exclusion: you canβt know when a lock holder has actually stopped. A holder might be GC-paused and resume after the lock has been reassigned. The fencing token ensures the new holder can distinguish stale grants.
Algorithm When to Use
βββββββββββββββββββββββββββββββββββββββββββββββ
Centralized Prototype, <10 nodes, simple
Token Ring Predictable throughput, small ring
Ricart-Agrawala Educative, understanding theory
ZooKeeper/etcd Everything production (really)
Check Your Understanding
- How many messages does centralized mutual exclusion use per critical section entry?
- In Ricart-Agrawala, why do we compare timestamps when deciding whether to defer a reply?
- Whatβs the βdelayed grantβ problem in token-based algorithms, and how do fencing tokens solve it?
The βSo What?β
Ricart-Agrawala is the canonical example of a decentralized coordination algorithm. Its timestamp-based approach is the ancestor of many modern protocols, including the way ZooKeeper orders requests via ZXID timestamps and how distributed databases sequence transactions with hybrid logical clocks.
βοΈ Exercises
Distributed Mutual Exclusion: Exercises
Exercise 1: Counting Messages
Consider a 10-node cluster using three different mutual exclusion algorithms. For each, calculate the number of messages needed for one critical section entry.
a) Centralized algorithm b) Token ring (best case β the requesting node holds the token) c) Token ring (worst case β the token is N-1 hops away and not currently held by a node that wants CS) d) Ricart-Agrawala e) Maekawaβs voting set algorithm
Exercise 2: Ricart-Agrawala Race Condition Analysis
Three processes P1, P2, P3 are running Ricart-Agrawala. P1 and P2 want to enter the critical section.
- P1 sends REQUEST(5) to P2 and P3 at time T=0
- P2 sends REQUEST(3) to P1 and P3 at time T=0 (same time, different logical clock speeds)
- P3 is not interested in the critical section
a) Who enters the critical section first? Why? b) Trace the sequence of messages. At each step, note whether a REPLY is sent or DEFERRED. c) When does the second process enter the critical section?
Exercise 3: The Delayed Grant Problem
Suppose a 4-node token ring uses a single token that circulates in the order P1 β P2 β P3 β P4 β P1.
Currently, P1 holds the token. P1 passes the token to P2, but the message is delayed due to network congestion. After a 500ms timeout, P1 assumes the token is lost and generates a new one, sending it to P2 again.
a) What goes wrong when P2 receives both tokens? b) How would you fix this using sequence numbers? c) If P1 used fencing tokens, how would a storage system distinguish the βrealβ token holder from the stale one?
ποΈ View Solutions
Distributed Mutual Exclusion: Solutions
Solution 1: Counting Messages
For a 10-node cluster:
a) Centralized: 3 messages per entry (REQUEST β coordinator, GRANT β coordinator, RELEASE β coordinator).
b) Token ring (best case): 1 message β the requesting node already holds the token or receives it as the immediate next in the ring.
c) Token ring (worst case): The token is N-1 hops away = 9 hops. But each hop is 1 message, and the requesting node doesnβt send any messages to request it β the token just arrives after visiting all other nodes. So 0 messages sent, up to 9 message arrivals before the token arrives. In terms of messages sent by the requesting node, itβs 0. In terms of total messages in the system per entry, itβs 1 (the token pass from whoever holds it).
d) Ricart-Agrawala: 2(N-1) = 2(9) = 18 messages β 9 REQUESTs sent, 9 REPLYs received.
e) Maekawaβs voting sets: 3βN β 3β10 β 3(3.16) β 9 or 10 messages per entry. (Actually βN β 3.16, so 3 Γ 3.16 = 9.48 β 10 messages in practice.)
Solution 2: Ricart-Agrawala Race Condition Analysis
a) P2 enters first. P2βs REQUEST has timestamp 3, which is earlier (smaller number = higher priority) than P1βs timestamp 5.
b) Message sequence:
T=0: P1 sends REQUEST(5) to P2, P3
P2 sends REQUEST(3) to P1, P3
P3 receives both REQUESTs, is not interested β sends REPLY to both immediately
P1 receives REQUEST(3) from P2:
- P1 is interested, has sent REQUEST(5)
- 3 < 5 (P2's timestamp is older)
- P1 sends REPLY to P2 immediately
P2 receives REQUEST(5) from P1:
- P2 is interested, has sent REQUEST(3)
- 5 > 3 (P1's timestamp is newer)
- P2 DEFERS reply to P1
T=1: P2 has all replies (from P1 and P3)
P2 enters critical section
T=2: P2 exits critical section
P2 sends deferred REPLY to P1
T=3: P1 has all replies (from P2 and P3)
P1 enters critical section
c) P2 enters at T=1. P1 enters at T=3 (after P2 exits and sends the deferred reply).
Solution 3: The Delayed Grant Problem
a) Two tokens exist simultaneously: P2 receives one token (the original delayed one) and then another (the regenerated one). If P2 enters the critical section with one token, and then another node (e.g., P3) receives the second token, both P2 and P3 could be in the critical section simultaneously β violating mutual exclusion.
b) Sequence number fix:
- Each token carries a monotonically increasing sequence number (e.g., token #1, token #2, etc.)
- When P1 regenerates the token, it increments the sequence number to token #2
- Nodes track the highest sequence number theyβve seen
- When P2 receives the delayed token #1 followed by token #2, it recognizes #1 as stale and discards it
P1 generates token#1 β P2 receives token#1 (but message delayed)
P1 times out β generates token#2 β P2 receives token#2
P2 updates highest_seq = 2
P2 later receives token#1 β discards (seq 1 < 2)
c) Fencing tokens work differently from token ring sequence numbers. A fencing token is issued by a lock service (like ZooKeeper) and monotonically increases each time a lock is granted:
Grant #1: fencing token = 1 for P1
Grant #2: fencing token = 2 for P2 (after P1's lock expired)
P1 writes to storage: "write X, fence_token=1"
(storage remembers last fence token = 2)
P2 writes to storage: "write Y, fence_token=2"
(storage accepts: 2 >= 2)
P1's delayed write arrives: "write X, fence_token=1"
(storage rejects: 1 < 2 β stale grant)
This provides idempotency and freshness guarantees even when messages are delayed. This is why production distributed locks (etcd, ZooKeeper, Redis Redlock) all recommend fencing tokens for correctness-critical operations.