Distributed & Decentralized Systems Curriculum
Time Causality · Physical Clock Drift

Key Question

How can a computer find out the current time from a time server, and what limits the accuracy of the answer?

Deep Dive

Cristian’s Algorithm is one of the simplest approaches to clock synchronization. It was published by Flaviu Cristian in 1989 and is still used in many systems where a moderate accuracy (millisecond-level) is sufficient. The idea is straightforward: a client asks a time server for the current time, measures the round-trip time (RTT), and uses the midpoint of the round trip as its best estimate of the true time.

The protocol works in three steps:

Client                          Server
  |                               |
  |-------- Request (T1) -------->|  T1 = client sends
  |                               |  T_server = server processes request
  |<------- Response (T2) --------|  T2 = response arrives at client
  |                               |

The client records T1 (just before sending the request). The server timestamps its response with its current time, which we call T_server. The client receives the response at its local time T2.

The estimated server time at the moment of receipt is:

estimated_server_time = T_server + (T2 - T1) / 2

The term (T2 - T1) is the round-trip time (RTT). The key assumption is that the network delay is symmetric — the request takes as long to travel to the server as the response takes to travel back. Under this assumption, adding half the RTT to the server’s timestamp gives the current server time.

A concrete example:

  • Client sends at T1 = 1000 (in milliseconds from some reference)
  • Server records its time as T_server = 10,000,050
  • Client receives at T2 = 1100
  • RTT = 1100 - 1000 = 100 ms
  • Estimated server time = 10,000,050 + 100/2 = 10,000,100

The client sets its clock to 10,000,100. If the client’s clock was at 1100 at receipt, it shifts forward by 10,000,100 - 1100 = 9,998,000 ms.

The accuracy problem. The symmetric delay assumption is almost always wrong. Network paths are asymmetric — the request and response may take different routes through routers, suffer different queuing delays at switches, or contend with different background traffic. The actual one-way delay from client to server might be 10 ms while the return path is 90 ms. In that case, the true server time at receipt is:

true_server_time = 10,000,050 + 90 = 10,000,140

But our formula gave 10,000,100. The error is 40 ms. In general, the maximum error is:

error_max = RTT/2 - one_way_min

where one_way_min is the minimum possible one-way delay (limited by the speed of light). If RTT is 100 ms and the minimum possible one-way delay is 5 ms (distance / speed of light), the maximum error is 100/2 - 5 = 45 ms. Cristian observed that the accuracy is roughly bounded by the uncertainty in the one-way delay.

To improve accuracy, the algorithm typically makes multiple requests and uses the response with the smallest RTT. A low RTT suggests the network was uncongested, making the symmetric assumption more plausible. If the fastest response has an RTT of 10 ms, the maximum error is about 5 ms — acceptable for many applications.

Cristian’s algorithm has a failure mode: if the server is faulty or unreachable, the client cannot synchronize. It also requires that the server has access to an accurate time source (like a GPS receiver or atomic clock). When no such source is available, the Berkeley Algorithm offers an alternative.

Check Your Understanding

  1. In Cristian’s algorithm, why does a smaller RTT produce a more accurate result?
  2. If RTT = 40 ms and the true one-way delays are 35 ms (request) and 5 ms (response), what is the error in the estimated time?
  3. Why does the algorithm make multiple requests and pick the one with the smallest RTT?

The “So What?”

Cristian’s algorithm appears wherever a machine needs to synchronize its clock over a network: embedded devices, boot-time synchronization in data centers, and as a component in larger time synchronization stacks. The core insight — the asymmetric network path limits your accuracy — applies to any protocol that estimates remote time. Whenever you see RTT-based time synchronization, the question to ask is: “How asymmetric is the path?”


✏️ Exercises

Topic 2: Physical Clock Drift — Exercises

Exercise 1: Drift Accumulation

A server room experiences a cooling failure. The temperature rises by 15 degrees Celsius, causing the quartz crystals in the servers’ clocks to drift at 20 ppm instead of their normal 5 ppm. The cooling outage lasts 30 days.

How much clock error accumulates during this period? Express your answer in seconds.


Exercise 2: Cristian’s Algorithm Error Bound

A client uses Cristian’s algorithm to synchronize with a time server. The client makes three requests and gets these results:

RequestT1 (client send)T_server (server time)T2 (client receive)
1010,000,000120
250010,000,100540
3100010,000,2001040

(a) Calculate the estimated server time for each request. (b) Which request gives the most accurate estimate? Why? (c) If the actual one-way delay from client to server is 50ms and from server to client is 10ms for the best request, what is the true server time at T2, and what is the error of Cristian’s estimate?


Exercise 3: Berkeley Algorithm Arithmetic

A cluster of 5 machines runs the Berkeley Algorithm. The master polls the slaves and receives these time offsets (in seconds from the master’s perspective at the moment of polling):

MachineOffset (seconds)
Master+1.00
Slave 1+0.50
Slave 2-0.75
Slave 3+0.25
Slave 4+5.00

(a) Identify any outliers. What threshold would you use? (b) Compute the fault-tolerant average. (c) Compute the adjustment for each machine. (d) What would happen to the cluster’s time if the Berkeley Algorithm were NOT used and each machine relied on its own quartz crystal?


Exercise 4: NTP Slewing vs. Stepping

A database server generates timestamps for every write. Its clock is wrong by +800 ms (it is 800 ms ahead of UTC). The NTP daemon has two options:

Option A: Step the clock backward by 800 ms immediately. Option B: Slew the clock at a rate of 50 ppm (50 microseconds per second) until it reaches the correct time.

(a) How long does Option B take to correct the 800 ms error? Show the calculation. (b) During Option A, describe the exact problem that can occur with database timestamps. (c) Which option would you choose for a system that assigns monotonically increasing transaction IDs? Why?

👁️ View Solutions

Topic 2: Physical Clock Drift — Solutions

Solution 1

Drift rate: D = 20 ppm = 20 × 10⁻⁶

Elapsed time: 30 days = 30 × 24 × 60 × 60 = 2,592,000 seconds

Error = D × 10⁻⁶ × elapsed_time = 20 × 10⁻⁶ × 2,592,000 = 20 × 2.592 = 51.84 seconds

Each server’s clock is off by about 52 seconds after 30 days. This is enough to cause authentication failures (Kerberos tickets typically have a 5-minute tolerance), SSL certificate validation failures, and confusing log timestamps.


Solution 2

(a)

Request 1: RTT = 120 - 0 = 120ms. Estimated = 10,000,000 + 120/2 = 10,000,060 Request 2: RTT = 540 - 500 = 40ms. Estimated = 10,000,100 + 40/2 = 10,000,120 Request 3: RTT = 1040 - 1000 = 40ms. Estimated = 10,000,200 + 40/2 = 10,000,220

(b) Request 2 and 3 both have RTT = 40ms — this ties for the smallest RTT. A smaller RTT suggests less network congestion and makes the symmetric delay assumption more plausible. Both Request 2 and 3 are equally “best” by RTT. However, if we note that Request 3’s estimated time is 10,000,220 with the same RTT, while Request 2’s is 10,000,120, this is simply because the server’s clock advanced between the two requests. The accuracy of the estimate (error from true time) is the same for both since the RTT is the same.

To choose between them, one could compare the variances of previous RTT measurements or simply use the first one with the minimum RTT.

(c) For Request 2: RTT = 40ms, but the true asymmetry is 50ms outbound, 10ms inbound.

True server time at T2 = T_server + inbound_delay = 10,000,100 + 10ms = 10,000,110

Cristian’s estimate = 10,000,120

Error = 10,000,120 - 10,000,110 = 10ms

The estimate is 10ms ahead of the true time. The error is: |inbound_delay - outbound_delay| / 2 = |10 - 50| / 2 = 20ms

But the actual error is 10ms (because of additional processing time). The general formula: maximum error = RTT/2 - min_one_way = 20 - 10 = 10ms.


Solution 3

(a) Outlier detection: The offsets are +1.00, +0.50, -0.75, +0.25, and +5.00. Slave 4’s offset of +5.00 seconds is far outside the range of the other four (which span -0.75 to +1.00). A simple threshold would be: discard any value more than, say, 3 seconds from the median. Slave 4 is clearly faulty — maybe its battery has died.

(b) Fault-tolerant average (excluding Slave 4):

Average = (+1.00 + 0.50 + (-0.75) + 0.25) / 4 = 1.00 / 4 = +0.25 seconds

(c) Adjustments:

Master: +1.00 → +0.25, adjust by -0.75 seconds (slew backward) Slave 1: +0.50 → +0.25, adjust by -0.25 seconds (slew backward) Slave 2: -0.75 → +0.25, adjust by +1.00 seconds (slew forward) Slave 3: +0.25 → +0.25, adjust by 0 seconds (already at average) Slave 4: excluded, not adjusted (but should be diagnosed)

(d) Without the Berkeley Algorithm, each machine’s clock would drift independently. After 30 days:

Master at 5 ppm: off by 5 × 10⁻⁶ × 2,592,000 = 13.0 seconds Slave 1 at 5 ppm: off by 13.0 seconds (in the same direction, perhaps) Slave 2 at 5 ppm: off by 13.0 seconds But if one machine has a crystal with a different drift rate (say 20 ppm due to manufacturing variance), it could be off by 52 seconds — producing a 39-second disagreement between machines. Database timestamps would disagree, logs would be impossible to correlate, and any ordering based on timestamps would be wrong.


Solution 4

(a) Option B (slew at 50 ppm):

50 ppm = 50 × 10⁻⁶ = 0.00005 seconds per second = 0.05 ms per second

To correct 800 ms:

time = 800 ms / (0.05 ms/s) = 16,000 seconds = 4 hours, 26 minutes, 40 seconds

(b) During Option A (step backward by 800ms):

At 10:00:00.000 (server time), the server writes transaction T1 with timestamp 10:00:00.000. NTP steps the clock back to 09:59:59.200 (correct UTC time). The next transaction T2 gets timestamp 09:59:59.200. Now T2 appears to “happen before” T1, even though it came after. Any system that relies on timestamp monotonicity (write-ahead logs, replication slots, MVCC visibility) may fail or corrupt data because the “before” and “after” ordering is reversed.

(c) For a system with monotonically increasing transaction IDs, Option B (slew) is strongly preferred. The 4.5-hour convergence time is acceptable for most applications — it prevents any discontinuity in timestamps. Option A risks data corruption or system crashes. If the error were extremely large (say, hours), the database might require a maintenance window for a stepped correction, but for 800ms, slewing is safe.