Distributed & Decentralized Systems Curriculum
Time Causality Β· Happened Before Relation

Key Question

If clocks are unreliable, how can we talk about β€œbefore” and β€œafter” in a distributed system?

Deep Dive

Physical clocks drift. NTP corrects them imperfectly. Even with perfect synchronization, two events on different machines can happen within the same nanosecond β€” close enough that no clock could distinguish their order. If we cannot rely on physical time to tell us which event came first, we need a different foundation for β€œbefore” and β€œafter.” That foundation is the happened-before relation, introduced by Leslie Lamport in his seminal 1978 paper β€œTime, Clocks, and the Ordering of Events in a Distributed System.”

Lamport observed that we do not need to know the absolute time when something happened. We only need to know whether one event could have influenced another. This is called a partial order: some pairs of events can be ordered, and some cannot. In a single program, every event has a clear order β€” line 10 runs before line 11. In a distributed system, the ordering is only partial because events on different machines may have no causal connection.

The happened-before relation is written as a β†’ b, meaning β€œa happened before b.” It is defined by three simple rules, and the first rule is: If events a and b occur in the same process and a comes before b, then a β†’ b.

Consider a space-time diagram. Time moves downward on the vertical axis. Each process is a vertical line. Local events are dots on the line.

Process P1      Process P2
    |               |
    a1              |
    |               |
    a2              |
    |               b1
    |               |
    a3              |
    |               b2
    |               |

In this diagram:

  • Within P1: a1 β†’ a2 β†’ a3 (each happens before the next)
  • Within P2: b1 β†’ b2
  • There is no relation between any event in P1 and any event in P2 because there is no communication between the processes.

The crucial point: we cannot say whether a1 happened before b1 or vice versa. They occurred on different machines with no shared clock and no communication. Physical time might tell us that b1 was timestamped at 10:00:00.001 and a1 at 10:00:00.002, but if the clocks on P1 and P2 are not synchronized, that comparison is meaningless. The happened-before relation says: we simply do not know, and we should not assume.

This is the fundamental break from single-machine programming. In a single program, total order is a given β€” the CPU executes instructions sequentially (or gives the illusion of doing so). In a distributed system, total order is not a given. You must build it explicitly, and the happened-before relation is the foundation of that construction.

The partial order is not a limitation β€” it is a liberation. It tells us exactly what we can know for certain (events within a process are ordered) and what we cannot (events across uncommunicating processes). A good distributed system design respects the partial order and does not pretend to know more than it can.

Check Your Understanding

  1. Why can’t physical timestamps reliably order events across different machines?
  2. In the space-time diagram above, can we say whether a1 happened before b1? Why or why not?
  3. What does β€œpartial order” mean in the context of distributed events?

The β€œSo What?”

Every distributed protocol you will study β€” from consensus algorithms to CRDTs to database replication β€” starts from the happened-before relation. When you debug a distributed system and ask β€œwhy did this transaction appear before that one?”, the answer is always rooted in causality, not in timestamps. Understanding partial ordering is the first step toward thinking in terms of causality rather than chronology, a shift that distinguishes distributed systems engineers from ordinary programmers.


✏️ Exercises

Topic 3: Happened-Before Relation β€” Exercises

Exercise 1: Identifying Causal Relationships

Given the space-time diagram below, list ALL pairs of events that are causally related (a β†’ b) and ALL pairs that are concurrent (a || b).

Process A       Process B       Process C
    |               |               |
    a1              |               |
    |               b1              |
    a2─────────────→b2              |
    |               |               c1
    a3─────────────→b3              |
    |               |               c2
    |               |               |

Use the notation (a1, b1) to mean β€œa1 and b1.” Include all pairing combinations across processes, not just within them. (You do not need to list same-process pairs β€” they are all ordered by rule 1.)


Exercise 2: Message Chain Causality

Process A sends a message M1 to Process B. After receiving M1, Process B sends a message M2 to Process C. After receiving M2, Process C performs an internal event.

(a) Draw the space-time diagram of this scenario (3 processes, 2 messages). (b) Is A’s send of M1 causally related to C’s internal event? Explain using the happened-before rules. (c) Would the answer change if B sent M2 before receiving M1? Why?


Exercise 3: Network Delay and Causality

Two events occur on different machines:

  • Event a: A sends a message to B.
  • Event b: B receives the message 5 seconds later (due to network congestion).

After receiving the message, B performs event c.

(a) Is b β†’ c? Why? (b) Is a β†’ b? Why? (c) Now suppose after sending the message, A performs event d (a local computation) 2 seconds before the message is delivered. Is d β†’ c? Trace the chain. (d) What if A and B’s clocks were perfectly synchronized, and the timestamps showed d at T=10:00:00.003, c at T=10:00:00.001? Does this contradict the happened-before relation? Why would clock-based ordering give the wrong answer?


Exercise 4: Concurrent Writes in Practice

A distributed key-value store has three replicas: R1, R2, R3. A client writes value β€œX” to key β€œk” and receives acknowledgment from R1. The write propagates to R2 and R3. Meanwhile, another client (unaware of the first write) writes value β€œY” to the same key β€œk” and receives acknowledgment from R2.

(a) Are the two writes concurrent? Why? (b) R3 receives the write of β€œY” after it has already received β€œX.” From R3’s perspective, which value is β€œcorrect”? (c) If the system uses last-writer-wins (LWW) based on physical timestamps, and the clocks on client 1 and client 2 differ by 50 seconds, what could go wrong? (d) What does the happened-before relation tell us that timestamps do not, in this scenario?

πŸ‘οΈ View Solutions

Topic 3: Happened-Before Relation β€” Solutions

Solution 1

Within-process relationships (all by rule 1):

  • A: a1 β†’ a2 β†’ a3
  • B: b1 β†’ b2 β†’ b3
  • C: c1 β†’ c2

Send-receive relationships (rule 2):

  • a2 β†’ b2 (message from A to B)
  • a3 β†’ b3 (another message from A to B)

Transitive relationships (rule 3):

From a1:

  • a1 β†’ a2 β†’ b2, so a1 β†’ b2
  • a1 β†’ a2 β†’ b2 β†’ b3, so a1 β†’ b3
  • a1 β†’ a2 β†’ b2 β†’ b3 β†’ b3 β†’ … actually b3 is followed by nothing in B that reaches C. But wait, a3 β†’ b3, not a2 β†’ b2 β†’ b3. Let me re-trace.

Actually let’s reconsider. The diagram shows a2 β†’ b2 and a3 β†’ b3. There is no message from B to C.

So transitive relationships:

  • a1 β†’ b2 (a1 β†’ a2 β†’ b2)
  • a1 β†’ b3 (a1 β†’ a2 β†’ b2 β†’ b3? No, a2 β†’ b2, and b2 β†’ b3 in B, so a1 β†’ a2 β†’ b2 β†’ b3, yes)
  • a1 β†’ b3 also via a1 β†’ a3 β†’ b3 (a1 β†’ a3, a3 β†’ b3, so a1 β†’ b3)
  • a2 β†’ b3 (a2 β†’ b2 β†’ b3)
  • a2 β†’ b3 also via a2 β†’ a3 β†’ b3
  • a3 β†’ b3 directly (message)
  • No relationships with C β€” there is no message from A or B to C, and no message from C to A or B.

Concurrent pairs:

All pairs involving C are concurrent (no messages to/from C):

  • a1 || c1, a1 || c2
  • a2 || c1, a2 || c2
  • a3 || c1, a3 || c2
  • b1 || c1, b1 || c2
  • b2 || c1, b2 || c2
  • b3 || c1, b3 || c2

Also concurrent across A and B:

  • a1 || b1 (no path either direction)
  • a2 || b1 (a2 β†’ b2, and b1 β†’ b2, but no path a2 β†’ b1 or b1 β†’ a2)
  • a3 || b1 (similar reasoning)
  • a3 || b2 (a3 β†’ b3, b2 β†’ b3, no path between a3 and b2 directly)

Any pair not connected by a β†’ b chain is concurrent.


Solution 2

(a)

Process A       Process B       Process C
    |               |               |
    a1 (send M1)───→b1 (recv M1)   |
    |               |               |
    |               b2 (send M2)───→c1 (recv M2)
    |               |               |
    |               |               c2 (internal)

(b) Yes, A’s send (a1) is causally related to C’s internal event (c2).

  • a1 β†’ b1 (rule 2: send-receive of M1)
  • b1 β†’ b2 (rule 1: same process)
  • b2 β†’ c1 (rule 2: send-receive of M2)
  • c1 β†’ c2 (rule 1: same process)
  • By transitivity (rule 3): a1 β†’ b1 β†’ b2 β†’ c1 β†’ c2, therefore a1 β†’ c2

(c) If B sent M2 before receiving M1, then b2 (send M2) would come before b1 (receive M1) in process B. In that case:

  • a1 β†’ b1 (send-receive of M1)
  • b2 β†’ c1 (send-receive of M2)
  • But a1 and b2? We have b2 β†’ b1 (b2 before b1 in B). And a1 β†’ b1. But do we have a1 β†’ b2? No. a1 β†’ b1 and b2 β†’ b1, but we need a1 β†’ … β†’ b2. b2 happens before b1, so a1 cannot β†’ b2. And b2 β†’ a1? No, b2 β†’ c1, no path to a1. So a1 and b2 are concurrent. And since b2 β†’ c1 β†’ c2, we have a1 || c2.

The answer changes completely: if B sends M2 before receiving M1, then M2 was sent without knowledge of M1, and C’s internal event is not causally dependent on A’s send.


Solution 3

(a) Yes, b β†’ c. Both events are in the same process (B), and b (receive) happens before c (internal event). Rule 1: same-process ordering.

(b) Yes, a β†’ b. a is the send event and b is the receive event of the same message. Rule 2: send happens before receive. The 5-second delay does not affect this β€” the relationship holds regardless of how long the message takes.

(c) Yes, d β†’ c. Here is the chain:

  • a (send) happens before d (local event after send) on process A: a β†’ d (rule 1)
  • a β†’ b (rule 2: send-receive)
  • d and b: a β†’ d and a β†’ b. Is d β†’ b? We don’t know d β†’ b directly. But d happens after a on the same process. b happens after a (message receive after send). But do we have a relation between d and b? a β†’ d and a β†’ b. But do we know d β†’ b? No, d happens after a on A, but we don’t have a message from A to B after d. d and b are either concurrent or… wait.

Actually let me reconsider. a β†’ d (same process). a β†’ b (send-receive). We have a β†’ d and a β†’ b. But no path from d to b. And b happens when it happens β€” we don’t have b β†’ d either (b is in B, d is in A, no message). So d and b are concurrent.

d β†’ c? We need d β†’ … β†’ c. We have b β†’ c (same process). But if d || b, then d β†’ c is not guaranteed.

Hmm, I was too hasty. Let me reconsider the scenario. The key is: did A send another message after d? The scenario says: A sends a message (a), then performs local computation d (2 seconds before delivery). No second message is sent. So d is not connected to B’s events. Therefore d and b are concurrent, and d and c are concurrent.

Wait, but is d β†’ b possible? d occurs on A after the send. The send is a β†’ B. The message is in transit. d happens before the message is received. But d is on A, b is on B. No message from d to b. So no causal link.

Actually I realize the scenario might be interpreted differently. Let me re-read: β€œafter sending the message, A performs event d (a local computation) 2 seconds before the message is delivered.” So timeline: a (send), then d (local), then b (receive on B). d occurs on A before the message is received on B.

Is d β†’ c? We need a chain. d β†’ … β†’ c. d is on A. Nothing connects d to B (no message from d). So d || b and d || c. The answer is no, d β†’ c is NOT guaranteed.

Wait, but can we say a β†’ d β†’ … and a β†’ b β†’ c? We know a is before both d and b. So a β†’ d and a β†’ b. But from d, is there any path to b or c? No. So d || c.

(d) If timestamps showed c at T=10:00:00.001 and d at T=10:00:00.003, clock-based ordering would say d happened after c. But the happened-before relation says they are concurrent β€” they have no causal link. If the clocks are not perfectly synchronized (which they never are), the timestamp ordering is meaningless. Even with perfect clocks, the ordering β€œc at 0.001, d at 0.003” does not mean c β†’ d β€” causality cannot be established by timestamps alone. The events are on different machines with no message between them, so they are concurrent regardless of what the clocks say.


Solution 4

(a) Yes, the two writes are concurrent. Client 1 writes to R1, and Client 2 writes to R2. Neither client knew about the other’s write β€” there is no causal chain connecting the two writes. No message, no shared state, no happened-before relation. They are concurrent (w1 || w2).

(b) From R3’s perspective, both values are β€œcorrect” in the sense that both writes happened without knowledge of the other. R3 cannot automatically determine which value is the β€œlatest” because there is no causal relationship between the writes. R3 has a conflict: two concurrent values for the same key.

(c) With LWW based on physical timestamps: if Client 1’s clock is 50 seconds ahead of Client 2’s clock, then Client 1’s write of X will have a timestamp 50 seconds β€œin the future” compared to Client 2’s write of Y. Even if Client 2’s write actually happened later in real time, the timestamp will make it appear older. LWW would select X (the β€œlater” timestamp), even though the correct behavior depends on application semantics, not clock readings.

(d) The happened-before relation tells us that the writes are concurrent β€” they are causally independent. This means the system cannot automatically resolve the conflict without application help. Timestamps (even with NTP) might assign an arbitrary ordering that could be wrong due to clock drift. The happened-before relation forces the system to acknowledge the conflict and either present both versions to the application or use a deterministic but semantically meaningful tiebreaker (like the application’s own merge logic).