Distributed Transactions Coordination · Distributed Transactions 2PC

Key Question

How do distributed databases handle concurrent transactions without sacrificing correctness?

Deep Dive

Three main approaches exist. All are extensions of their single-node counterparts with distributed coordination added.

1. Distributed Two-Phase Locking (2PL)

Each node manages locks for the data it owns. A transaction acquires locks as it goes (growing phase) and releases them only at commit (shrinking phase). Deadlocks can span nodes: T1 (on Node A) waits for T2, which waits for T3 (on Node B), which waits for T1.

     Node A              Node B
    T1 holds L1         T3 holds L3
    T2 wants L1         T1 wants L3  <-- global deadlock

Solutions:

Wait-die / Wound-wait: Use transaction timestamps. Older transactions wait for younger; younger transactions abort if they conflict with older. Deadlock prevention, not detection.
Global deadlock detection: Each node builds a local wait-for graph. A coordinator merges them periodically and picks a victim to abort.

2. Timestamp Ordering (TO)

Every transaction gets a global timestamp (e.g., from a sequencer or hybrid logical clock). Nodes execute operations in timestamp order. Thomas Write Rule (TWR) optimizes writes: if a stale write arrives late, it’s ignored instead of aborting the transaction.

3. Optimistic Concurrency Control (OCC)

No locking during execution. At commit time, each node validates whether its reads are still current. If any conflicted, the transaction aborts. High throughput under low contention; bad under high contention.

Approach	Blocking?	Deadlocks?	Best for
Distributed 2PL	Yes	Yes	General purpose
Timestamp Ordering	No	No	Ordered workloads
OCC	No	No	Low contention, many reads

Spanner’s approach: Google Spanner uses strict two-phase locking for writes + TrueTime (GPS + atomic clocks) for globally-ordered timestamps. This gives external consistency: if T1 commits before T2 starts, T1’s timestamp is strictly less than T2’s. No clock skew loophole.

Check Your Understanding

How can a deadlock span multiple nodes?
What does Thomas Write Rule do?
Why does Google Spanner use TrueTime timestamps instead of a centralized sequencer?

The “So What?”

Distributed concurrency control is the bottleneck that separates single-region databases from globally-distributed ones. Spanner’s external consistency was considered impossible before TrueTime. Every approach here is a bet: sacrifice throughput for safety (2PL), throughput for latency (OCC), or simplicity for clock infrastructure (TrueTime).

✏️ Exercises

Exercises: Distributed Transactions & 2PC

2PC Log State: In the Two-Phase Commit protocol, what exact state does a participant write to its local durable log before responding “YES” to the coordinator’s prepare request? Why must this state be written to a log rather than kept in memory?
Blocking Explained: Consider a 2PC execution where the coordinator crashes after receiving all “YES” responses but before any participant has received the commit/abort decision. Participant 1 is running, Participant 2 is running, and Participant 3 is also running. Can any of these participants safely decide on their own? Explain what happens to each.
Distributed 2PL and Phantoms: A distributed database uses Two-Phase Locking (2PL) for concurrency control. Transaction T1 runs a query: SELECT * FROM accounts WHERE balance > 1000 across two shards. Transaction T2 inserts a new row with balance = 2000 on shard 2. How can a phantom read occur, and what mechanism (in addition to 2PL) is needed to prevent it?

👁️ View Solutions

Solutions: Distributed Transactions & 2PC

Exercise 1

The participant writes PREPARED (or READY) to its log, along with the transaction’s writes (enough information to commit later). This must be durable (fsynced to disk) before the YES response is sent.

Why a log instead of memory: If the participant crashes after sending YES but before receiving the coordinator’s decision, memory is lost. On recovery, the participant reads its log. If the log says PREPARED, it knows it made a promise and must wait for the coordinator. If the log says COMMIT or ABORT, it follows that. Without the log, the participant would have no memory of the promise after restarting.

Exercise 2

No participant can safely decide on their own.

Participant 1 received the PREPARE, said YES, and is now waiting. It has not received a COMMIT or ABORT. It holds locks and is blocked until the coordinator recovers.
Participant 2 is in the same state — blocked.
Participant 3 is also in the same state — blocked.

If any participant unilaterally commits, it might violate atomicity: the coordinator might have decided to abort (e.g., if it suspected a timeout from one participant). If any participant unilaterally aborts, it might violate atomicity: the coordinator might have logged COMMIT before crashing.

All three participants are blocked until the coordinator recovers and reveals the decision.

Exercise 3

Phantom read scenario:

T1’s query SELECT * FROM accounts WHERE balance > 1000 on shard 2 returns no rows.
T2 inserts a new row with balance = 2000 on shard 2 and commits.
If T1 re-runs the same query, it now sees the new row — a “phantom” that appeared from nowhere.

Standard 2PL (locking individual rows) does not prevent this because the new row didn’t exist when T1 first read.

Solution: A predicate lock or next-key lock is needed. In a distributed setting, this means locking the range balance > 1000 on each shard. The lock prevents T2 from inserting rows matching that predicate. In practice, many systems use serializable snapshot isolation (SSI) instead, which detects phantoms at commit time by checking for conflicting predicate reads.