Consensus Fault Tolerance · FLP Impossibility

Key Question

What exactly does FLP prove, and how does the proof work?

Deep Dive

The FLP theorem (Fischer, Lynch, Paterson, 1985) states: In an asynchronous system, no deterministic consensus algorithm can guarantee reaching consensus if even one process can crash.

This is not a statement about any particular algorithm. It is a universal impossibility result — like the halting problem for distributed systems. Let’s understand the proof at a conceptual level.

Valency: At any point in the execution, the system is in some configuration. A configuration is univalent if all possible futures lead to the same decision value. It is 1-valent if it must eventually decide 1, 0-valent if it must decide 0. A configuration is bivalent if both outcomes are still possible — the system could still decide either 0 or 1.

Initial config: bivalent (could go either way)

         |-- decide 1 path
         |
  ...---[BIVALENT] 
         |
         |-- decide 0 path

Proof sketch (high level):

Initial bivalence: Start from an initial configuration where all processes propose 0 or 1. At least one initial configuration must be bivalent (otherwise we could predict the outcome, which would require solving consensus in a way that’s impossible even in trivial cases).
The critical step: From a bivalent configuration, there is always some sequence of message deliveries that produces another bivalent configuration. Crucially, we can always delay one message — and since the system is asynchronous, a delayed message is indistinguishable from a lost one.
Indefinite delay: By repeatedly applying step 2, we can construct an infinite execution where the system stays bivalent forever. Each step delays a message to a different process, and since one process might be faulty, we can keep the system bivalent indefinitely.

Time -->
B    B    B    B    B    B    B    B    ...
|    |    |    |    |    |    |    |
All configurations are bivalent — no decision ever reached

Why one faulty process is enough: In a perfectly reliable system, you could wait for all processes to respond. But with one crash-possible process, you must make decisions without hearing from everyone. The adversary (scheduler) exploits this by delaying messages to the right nodes at the right time, keeping the system perpetually uncertain.

Check Your Understanding

What is the difference between a univalent and a bivalent configuration?
Why must there be at least one initial bivalent configuration?
Could the FLP result be circumvented by having processes share a synchronized clock?

The “So What?”

FLP is not a “bug” — it’s a fundamental law of distributed computing, like the speed of light in physics. Understanding FLP teaches you that distributed consensus is always a tradeoff: you can have guaranteed safety, or guaranteed liveness, but not both in an asynchronous system.

✏️ Exercises

FLP Impossibility: Exercises

Exercise 1: Synchrony and FLP

If a system can guarantee message delivery within 100ms, does FLP apply? Explain why or why not.

Exercise 2: Valency

What is the difference between a univalent configuration and a bivalent configuration? Give a concrete example of each in a 3-node system trying to agree on 0 or 1.

Exercise 3: Timeouts and the Async Model

How does using a timeout violate the pure asynchronous model? If a timeout fires, what can you conclude — and what can’t you conclude?

👁️ View Solutions

FLP Impossibility: Solutions

Exercise 1

If a system guarantees message delivery within 100ms, FLP does not apply as originally stated because the system is synchronous (bounded message delay), not asynchronous. FLP assumes no upper bound on message delay. With a 100ms bound, you can use timeouts safely — if you don’t hear back in 100ms, the node must be crashed.

However, in practice, the 100ms bound might not hold during garbage collection pauses, network congestion, or overload. So while the theoretical FLP doesn’t apply to this system, the practical concern remains — you must be certain the bound always holds.

Exercise 2

Univalent: A configuration where all possible executions lead to the same decision value.

Example of 1-valent: All three nodes have exchanged messages indicating they’ve accepted proposal value 1. No matter what messages are delivered next, the system will decide 1.

Bivalent: A configuration where both 0 and 1 are still possible outcomes depending on future message deliveries.

Example of bivalent: Two nodes proposed 0, one node proposed 1, and no majority has formed. If the first two nodes communicate first, they’ll decide 0. If the third node makes a counter-proposal to one of the 0-proposers before that node hears from the other 0-proposer, the system might shift toward 1. Both outcomes are still possible.

Exercise 3

A timeout assumes an upper bound on message delivery time. If a message hasn’t arrived within the timeout period, you conclude the recipient is faulty. But in the pure asynchronous model, the message might arrive in the next microsecond — there’s no way to distinguish “slow” from “crashed.”

When a timeout fires, you can conclude: “I haven’t received a response yet.” You cannot conclude: “The node has crashed.” This is why timeouts are called “unreliable failure detectors” — they can suspect correct processes and fail to suspect crashed ones. The pure async model forbids even unreliable failure detection because it assumes no timing information is available at all.