Distributed & Decentralized Systems Curriculum
Consensus Fault Tolerance · Byzantine Fault Tolerance

Key Question

How does the Practical Byzantine Fault Tolerance (PBFT) algorithm work in three phases?

Deep Dive

PBFT (Castro & Liskov, 1999) was the first algorithm to make BFT practical. It reduced the complexity from exponential (theoretical BFT) to polynomial (O(n²) messages). It operates in three phases: Pre-prepare, Prepare, and Commit.

Setup: N = 3f + 1 replicas. One replica is the primary (leader), ordered by view number. Others are backups. Views change when the primary is suspected faulty.

Three phases for a request:

Client sends request to primary.

View v, Primary = replica 0
Backups = replicas 1, 2, 3 (for f=1, N=4)

Phase 1 — PRE-PREPARE:
  Primary (0) assigns sequence number n, creates message:
    <<PRE-PREPARE, v, n, d( request )>, m>
  Broadcasts to all backups (1, 2, 3).
  Backups accept if: (a) in view v, (b) never accepted PRE-PREPARE for v,n,
  (c) signature is valid, (d) n within watermarks.

Phase 2 — PREPARE:
  Each backup (and primary) broadcasts:
    <PREPARE, v, n, d( request ), i>
  Node collects PREPARE messages (including its own).
  When node receives 2f matching PREPARE messages from DIFFERENT replicas
  (f+1 for itself + f from others), it enters "Prepared" state.
  This means: 2f total matching Prepare messages (including self).

Phase 3 — COMMIT:
  After entering Prepared state, each replica broadcasts:
    <COMMIT, v, n, d( request ), i>
  Node collects COMMIT messages.
  When node receives 2f+1 matching COMMIT messages (including its own),
  the request is committed and executed.

Then each replica sends the result directly to the client.
The client accepts the result when it receives f+1 matching replies.

Visual message flow (f=1, N=4):

Client     Primary (0)    Backup (1)    Backup (2)    Backup (3)
  |            |              |             |             |
  |--request-->|              |             |             |
  |            |--Pre-prep-->|             |             |
  |            |--Pre-prep--->|            |             |
  |            |--Pre-prep---->|           |             |
  |            |              |             |             |
  |            |<-Prepare-----|             |             |
  |            |<---Prepare---|---Prepare-->|             |
  |            |<----Prepare---|---Prepare---->           |
  |            |<----Prepare---|---Prepare------>         |
  |            |              |             |             |
  |            |--Commit----->|             |             |
  |            |--Commit-------->           |             |
  |            |--Commit--------->          |             |
  |            |<-Commit-------|             |             |
  |            |<---Commit---|--Commit------>|             |
  |            |<----Commit---|---Commit-------            |
  |            |              |             |             |
  |<---reply---|              |             |             |
  |<---reply------------------|             |             |
  |<---reply--------------------             |             |
  |              (Client accepts after f+1 = 2 matching)   |

Why three phases?

  • Pre-prepare: Assigns a sequence number to prevent ordering confusion. Only the primary sends it.
  • Prepare: Ensures that 2f+1 replicas agree on the order within the same view. This is the “agreement” sub-protocol.
  • Commit: Ensures that the request is committed even if the view changes. When a replica commits, it knows that any future view will contain this request (because 2f+1 replicas committed it, and at least f+1 correct replicas in any view will carry it forward).

View change: If the primary is faulty, backups initiate a view change (view v → v+1). The new primary collects prepared requests from 2f+1 replicas and carries them forward, ensuring that committed requests survive view changes.

Check Your Understanding

  1. In PBFT with f=2 (7 nodes), how many Prepare messages must a replica receive to enter the Prepared state?
  2. Why is the Prepare phase needed if we already have Pre-prepare? What does it add?
  3. What happens if the primary is Byzantine and sends different Pre-prepare messages to different backups?

The “So What?”

PBFT made BFT practical for real systems by reducing complexity from exponential to polynomial (O(n²) messages). It’s used in Hyperledger Fabric, Zilliqa, and several permissioned blockchain platforms. The three-phase pattern (propose → prepare → commit) is the foundation for nearly all modern BFT algorithms.


✏️ Exercises

Byzantine Fault Tolerance: Exercises

Exercise 1: 3f+1 Bound

How many nodes are needed to tolerate 3 Byzantine faulty nodes? Show the math.

Exercise 2: PBFT Phases

In PBFT, why is the Prepare phase needed if we already have Pre-prepare? What problem would occur with only Pre-prepare and Commit?

Exercise 3: Proof-of-Work as BFT

How does Proof-of-Work act as a probabilistic BFT mechanism? What happens if an attacker controls 40% of the total hashrate?

👁️ View Solutions

Byzantine Fault Tolerance: Solutions

Exercise 1

N ≥ 3f + 1 = 3(3) + 1 = 10 nodes.

For f = 3, you need at least 10 nodes. With 10 nodes, you can tolerate up to 3 Byzantine faults. With only 9 nodes (3f, but 3f+1 is 10), you’d have 3 correct nodes per “group” — any 3 faulty nodes could group together and confuse the protocol since honest nodes need 2f+1 replies but can’t distinguish liars from honest ones in the worst case.

Compare to crash faults: 2f+1 = 7 nodes needed for f=3 crash tolerance. BFT requires 3 more nodes.

Exercise 2

Without the Prepare phase, a single faulty primary could send different Pre-prepare messages to different replicas, and those replicas would commit different values in the Commit phase — violating agreement.

The Prepare phase ensures within-view consistency: replicas share their Pre-prepare receipts with each other. When a replica collects 2f matching Prepare messages (including its own Pre-prepare), it knows that at least f+1 correct replicas received the same Pre-prepare. Even if the primary is Byzantine, the correct replicas have agreed on the ordering.

The Commit phase then ensures cross-view consistency: that the decision survives view changes.

Without Prepare, a Byzantine primary could send Pre-prepare(“attack”) to half the replicas and Pre-prepare(“retreat”) to the other half. The Commit phase alone can’t detect this because replicas don’t cross-check what they received from the primary.

Exercise 3

Proof-of-Work acts as probabilistic BFT because:

  1. Identity is expensive: Each block requires computational work. An adversary must expend compute power proportional to their hashrate to produce blocks.
  2. Chain selection rule: The longest (most work) chain is considered canonical. To reverse a transaction, the adversary must outpace the honest chain — requiring >50% of total hashrate.
  3. Probabilistic finality: Each confirmation block reduces the probability that an adversary can rewrite history.

With 40% hashrate, an attacker has a significant but not dominant position. The probability of a successful double-spend attack decreases exponentially with confirmations:

  • After 1 block: ~40% chance (if attacker has pre-mined)
  • After 3 blocks: ~10% chance
  • After 6 blocks: ~0.8% chance
  • After 10 blocks: ~0.02% chance

The attacker can temporarily overtake the honest chain, but the honest chain grows faster on average. The system recommends waiting for enough confirmations (typically 6 for Bitcoin) to make reversal economically infeasible.

This contrasts with classical BFT where finality is instant and deterministic — once 2f+1 nodes commit, the decision is final regardless of adversarial hashrate.