Distributed Transactions Coordination · Distributed Mutual Exclusion

Key Question

How does Ricart-Agrawala achieve mutual exclusion without a coordinator or token?

Deep Dive

Ricart-Agrawala is the most influential distributed mutual exclusion algorithm. It’s fully decentralized: every node participates in granting access. No coordinator, no circulating token — just timestamped messages and logical clocks.

The Protocol

When node Pᵢ wants to enter the critical section:

1. Send REQUEST(timestamp, Pᵢ) to ALL other N-1 nodes
2. Wait until all N-1 have replied
3. Enter critical section
4. On exit, send deferred replies

When node Pⱼ receives REQUEST(t, Pᵢ):

  IF Pⱼ is NOT interested in CS:
      ──→ REPLY immediately

  IF Pⱼ IS interested in CS and hasn't sent its own REQUEST yet:
      ──→ REPLY immediately (Pᵢ's request is earlier)

  IF Pⱼ IS interested in CS and HAS sent its own REQUEST(tⱼ, Pⱼ):
      IF t < tⱼ (Pᵢ's timestamp is older/higher priority):
          ──→ REPLY immediately
      ELSE (t > tⱼ, Pᵢ's request is newer/lower priority):
          ──→ Defer reply (queue it)

3-Node Timeline

P1 wants CS at T=10:          P2 wants CS at T=15:      P3 (not interested):
     │                              │                         │
     │──REQUEST(10,P1)────→ P2 ────│─────────────────────────→│ P3
     │                              │                         │
     │                              │ P2 is interested, has   │
     │                              │ newer timestamp (15>10) │
     │                              │ so P2 DEFERS reply      │
     │                              │                         │
     │←─── REPLY ───────────── P2   │ P2 sees t=10 < t=15,    │
     │                              │ so P2 REPLIES           │
     │←─── REPLY ───────────────────── P3 (not interested)    │
     │                              │                         │
     │  *** P1 enters CS ***        │                         │
     │  (has all N-1 replies)       │                         │
     │                              │                         │
     │─── RELEASE ────────────→ P2  │ Deferred replies sent   │
     │                              │ P2 now has all replies  │
     │                              │ *** P2 enters CS ***    │

Key Properties

Safety: Only one node enters CS at a time. Proof: if two nodes both entered, they’d each need replies from all others, including each other — but the later requester defers its reply to the earlier one.

Liveness: The request with the smallest timestamp gets replies from everyone. No deadlock because timestamps provide a total ordering.

Fairness: Requests are granted in timestamp order (FCFS).

Message complexity: 2(N-1) messages per entry — (N-1) REQUESTs + (N-1) REPLYs.

Why timestamps matter:

Without timestamps, concurrent requests could deadlock:
  P1: "I want CS"
  P2: "I want CS"
  Both wait forever for each other's reply.

With timestamps (logical clocks):
  P1: REQUEST(10)   P2: REQUEST(15)
  P2 sees 10 < 15 → P2 replies to P1 (defers its own)
  P1 enters, exits, sends deferred reply to P2
  P2 enters (all replies received)

Optimization: Maekawa’s Voting Sets

Maekawa reduced messages to ~3√N by having nodes request a voting set (subset of nodes) instead of all nodes. Each pair of voting sets must intersect. This reduces message complexity but introduces the possibility of deadlock.

Check Your Understanding

How many messages does Ricart-Agrawala use per critical section entry?
In Ricart-Agrawala, why do we compare timestamps when deciding whether to defer a reply?
If P1 sends REQUEST(5) and P2 sends REQUEST(3), who enters the critical section first? Why?

The “So What?”

Ricart-Agrawala is the canonical example of a decentralized coordination algorithm. Its timestamp-based approach is the ancestor of many modern protocols, including the way ZooKeeper orders requests via ZXID timestamps and how distributed databases sequence transactions with hybrid logical clocks.

✏️ Exercises

Distributed Mutual Exclusion: Exercises

Exercise 1: Counting Messages

Consider a 10-node cluster using three different mutual exclusion algorithms. For each, calculate the number of messages needed for one critical section entry.

a) Centralized algorithm b) Token ring (best case — the requesting node holds the token) c) Token ring (worst case — the token is N-1 hops away and not currently held by a node that wants CS) d) Ricart-Agrawala e) Maekawa’s voting set algorithm

Exercise 2: Ricart-Agrawala Race Condition Analysis

Three processes P1, P2, P3 are running Ricart-Agrawala. P1 and P2 want to enter the critical section.

P1 sends REQUEST(5) to P2 and P3 at time T=0
P2 sends REQUEST(3) to P1 and P3 at time T=0 (same time, different logical clock speeds)
P3 is not interested in the critical section

a) Who enters the critical section first? Why? b) Trace the sequence of messages. At each step, note whether a REPLY is sent or DEFERRED. c) When does the second process enter the critical section?

Exercise 3: The Delayed Grant Problem

Suppose a 4-node token ring uses a single token that circulates in the order P1 → P2 → P3 → P4 → P1.

Currently, P1 holds the token. P1 passes the token to P2, but the message is delayed due to network congestion. After a 500ms timeout, P1 assumes the token is lost and generates a new one, sending it to P2 again.

a) What goes wrong when P2 receives both tokens? b) How would you fix this using sequence numbers? c) If P1 used fencing tokens, how would a storage system distinguish the “real” token holder from the stale one?

👁️ View Solutions

Distributed Mutual Exclusion: Solutions

Solution 1: Counting Messages

For a 10-node cluster:

a) Centralized: 3 messages per entry (REQUEST → coordinator, GRANT ← coordinator, RELEASE → coordinator).

b) Token ring (best case): 1 message — the requesting node already holds the token or receives it as the immediate next in the ring.

c) Token ring (worst case): The token is N-1 hops away = 9 hops. But each hop is 1 message, and the requesting node doesn’t send any messages to request it — the token just arrives after visiting all other nodes. So 0 messages sent, up to 9 message arrivals before the token arrives. In terms of messages sent by the requesting node, it’s 0. In terms of total messages in the system per entry, it’s 1 (the token pass from whoever holds it).

d) Ricart-Agrawala: 2(N-1) = 2(9) = 18 messages — 9 REQUESTs sent, 9 REPLYs received.

e) Maekawa’s voting sets: 3√N ≈ 3√10 ≈ 3(3.16) ≈ 9 or 10 messages per entry. (Actually √N ≈ 3.16, so 3 × 3.16 = 9.48 ≈ 10 messages in practice.)

Solution 2: Ricart-Agrawala Race Condition Analysis

a) P2 enters first. P2’s REQUEST has timestamp 3, which is earlier (smaller number = higher priority) than P1’s timestamp 5.

b) Message sequence:

T=0: P1 sends REQUEST(5) to P2, P3
     P2 sends REQUEST(3) to P1, P3
     
     P3 receives both REQUESTs, is not interested → sends REPLY to both immediately
     
     P1 receives REQUEST(3) from P2:
       - P1 is interested, has sent REQUEST(5)
       - 3 < 5 (P2's timestamp is older)
       - P1 sends REPLY to P2 immediately
     
     P2 receives REQUEST(5) from P1:
       - P2 is interested, has sent REQUEST(3)
       - 5 > 3 (P1's timestamp is newer)
       - P2 DEFERS reply to P1
     
T=1: P2 has all replies (from P1 and P3)
     P2 enters critical section
     
T=2: P2 exits critical section
     P2 sends deferred REPLY to P1
     
T=3: P1 has all replies (from P2 and P3)
     P1 enters critical section

c) P2 enters at T=1. P1 enters at T=3 (after P2 exits and sends the deferred reply).

Solution 3: The Delayed Grant Problem

a) Two tokens exist simultaneously: P2 receives one token (the original delayed one) and then another (the regenerated one). If P2 enters the critical section with one token, and then another node (e.g., P3) receives the second token, both P2 and P3 could be in the critical section simultaneously — violating mutual exclusion.

b) Sequence number fix:

Each token carries a monotonically increasing sequence number (e.g., token #1, token #2, etc.)
When P1 regenerates the token, it increments the sequence number to token #2
Nodes track the highest sequence number they’ve seen
When P2 receives the delayed token #1 followed by token #2, it recognizes #1 as stale and discards it

P1 generates token#1 → P2 receives token#1 (but message delayed)
P1 times out → generates token#2 → P2 receives token#2
P2 updates highest_seq = 2
P2 later receives token#1 → discards (seq 1 < 2)

c) Fencing tokens work differently from token ring sequence numbers. A fencing token is issued by a lock service (like ZooKeeper) and monotonically increases each time a lock is granted:

Grant #1: fencing token = 1 for P1
Grant #2: fencing token = 2 for P2 (after P1's lock expired)

P1 writes to storage: "write X, fence_token=1"
(storage remembers last fence token = 2)

P2 writes to storage: "write Y, fence_token=2"
(storage accepts: 2 >= 2)

P1's delayed write arrives: "write X, fence_token=1"
(storage rejects: 1 < 2 → stale grant)

This provides idempotency and freshness guarantees even when messages are delayed. This is why production distributed locks (etcd, ZooKeeper, Redis Redlock) all recommend fencing tokens for correctness-critical operations.