Distributed & Decentralized Systems Curriculum
Consensus Fault Tolerance · Byzantine Fault Tolerance

Key Question

How do we build an algorithm that lets honest nodes reach agreement even when some nodes are actively lying?

Deep Dive

The byzantine.ts file implements the Signed Message Protocol for the Byzantine Generals Problem — one of the most famous algorithms in distributed systems. Let me walk through the code and the logic.

The Setup

We simulate a 4-general system (1 commander + 3 lieutenants). This satisfies the famous 3f + 1 bound: with f = 1 traitor, we need at least 4 generals. The core types are:

type Order = 'attack' | 'retreat'
type GeneralType = 'loyal' | 'traitor'
type GeneralRole = 'commander' | 'lieutenant'

interface Message {
  from: number
  order: Order
  signatures: Set<number>  // chain of signing generals
  path: number[]
}

The simulation runs three scenarios:

  1. All loyal: Everyone is honest. The baseline.
  2. Loyal commander + traitor lieutenant: Can the loyal lieutenants still agree despite one of them lying?
  3. Traitor commander + traitor lieutenant: Can the two loyal lieutenants still agree when both the commander and a lieutenant are malicious?

Phase 1: The Commander Broadcasts

If the commander is loyal, they send the same signed order to all lieutenants. If the commander is a traitor, they send different orders to different lieutenants:

sendOrder(order: Order): Message[] {
  if (this.type === 'traitor') {
    // Send conflicting orders to sow confusion
    return [
      this.sign({ from: this.id, order: 'attack', signatures: new Set(), path: [] }),
      this.sign({ from: this.id, order: 'retreat', signatures: new Set(), path: [] }),
    ]
  }
  // Loyal commander: same order to all
  return [
    this.sign({ from: this.id, order, signatures: new Set(), path: [] }),
    this.sign({ from: this.id, order, signatures: new Set(), path: [] }),
  ]
}

The signature is a set of general IDs appended to the message. This is the key innovation: a traitor cannot forge a loyal general’s signature.

Phase 2: Forwarding with Signatures

Each lieutenant receives the commander’s message. A loyal lieutenant signs it and honestly forwards it. A traitor lieutenant lies about the contents:

if (lt.type === 'traitor') {
  // Traitor lieutenant: lies about what they received
  const fakeOrder: Order = incoming.order === 'attack' ? 'retreat' : 'attack'
  const fakeMsg: Message = {
    from: lt.id,
    order: fakeOrder,
    signatures: new Set([...incoming.signatures]),
    path: [...incoming.path],
  }
  lt.sign(fakeMsg)
  // Forward the LIE to all other lieutenants
  for (const other of lieutenants) {
    if (other.id === lt.id) continue
    other.receivedMessages.push(structuredClone(fakeMsg))
  }
} else {
  // Loyal lieutenant: honestly forward the signed message
  const forwarded = lt.sign(structuredClone(incoming))
  for (const other of lieutenants) {
    if (other.id === lt.id) continue
    other.receivedMessages.push(structuredClone(forwarded))
  }
}

When General 1 receives a message claiming General 2 said “attack”, they can verify General 2’s signature is authentic. This prevents traitors from forging unlimited confusion.

Phase 3: The Decision Rule

Each lieutenant collects all messages, filters for those carrying the commander’s signature (ID=0), and takes a majority vote:

const commanderOrders = lt.receivedMessages
  .filter(m => m.signatures.has(0))
  .map(m => m.order)

const attacks = commanderOrders.filter(o => o === 'attack').length
const retreats = commanderOrders.filter(o => o === 'retreat').length
const decision: Order = attacks >= retreats ? 'attack' : 'retreat'

Verification: IC1 and IC2

  • IC1 (Agreement): All loyal lieutenants must agree on the same plan.
  • IC2 (Validity): If the commander is loyal, all loyal lieutenants must follow the commander’s order.

The simulation verifies both conditions after each run.

Key Takeaways

  • Signatures prevent forgery: A traitor can lie about their own messages, but cannot impersonate a loyal general.
  • 3f+1 is the minimum: You need at least 3f + 1 nodes to tolerate f Byzantine failures (compared to 2f + 1 for crash failures).
  • Deterministic decision rules: Every loyal node uses the same deterministic function (majority vote) so they converge.
  • The “arbitrary failure” model: Byzantine faults include lies, collusion, and malicious intent — not just crashes.

Running the Code

cd lessons/03-Consensus-Fault-Tolerance/04-Byzantine-Fault-Tolerance/
npx ts-node byzantine.ts

Full Source

View or download the complete implementation: byzantine.ts

Exercises

  1. Why is the bound 3f + 1? Why can’t we tolerate 1 traitor with 3 generals total?
  2. In the signed message protocol, what stops a traitor from forging a message from the commander?
  3. In the Unsigned (Oral) Message protocol, what changes? Why is signed easier?

👁️ View Solutions

  1. For 3 generals with 1 traitor: if a loyal lieutenant receives conflicting orders (one from the commander, one from a traitor claiming the commander said something else), and no majority exists, they cannot determine the truth. With 4 generals, there are always at least 2 other loyal lieutenants to cross-reference, forming a majority.
  2. Digital signatures or, in our simulation, the immutable set of signer IDs. A traitor cannot add the commander’s ID (or another loyal general’s ID) in a way that the recipient can’t detect.
  3. In the oral (unsigned) version, a traitor can claim the commander said anything. Signed messages eliminate the ability to forge others’ identities, reducing the number of messages needed to reach consensus.

✏️ Exercises

Byzantine Fault Tolerance: Exercises

Exercise 1: 3f+1 Bound

How many nodes are needed to tolerate 3 Byzantine faulty nodes? Show the math.

Exercise 2: PBFT Phases

In PBFT, why is the Prepare phase needed if we already have Pre-prepare? What problem would occur with only Pre-prepare and Commit?

Exercise 3: Proof-of-Work as BFT

How does Proof-of-Work act as a probabilistic BFT mechanism? What happens if an attacker controls 40% of the total hashrate?

👁️ View Solutions

Byzantine Fault Tolerance: Solutions

Exercise 1

N ≥ 3f + 1 = 3(3) + 1 = 10 nodes.

For f = 3, you need at least 10 nodes. With 10 nodes, you can tolerate up to 3 Byzantine faults. With only 9 nodes (3f, but 3f+1 is 10), you’d have 3 correct nodes per “group” — any 3 faulty nodes could group together and confuse the protocol since honest nodes need 2f+1 replies but can’t distinguish liars from honest ones in the worst case.

Compare to crash faults: 2f+1 = 7 nodes needed for f=3 crash tolerance. BFT requires 3 more nodes.

Exercise 2

Without the Prepare phase, a single faulty primary could send different Pre-prepare messages to different replicas, and those replicas would commit different values in the Commit phase — violating agreement.

The Prepare phase ensures within-view consistency: replicas share their Pre-prepare receipts with each other. When a replica collects 2f matching Prepare messages (including its own Pre-prepare), it knows that at least f+1 correct replicas received the same Pre-prepare. Even if the primary is Byzantine, the correct replicas have agreed on the ordering.

The Commit phase then ensures cross-view consistency: that the decision survives view changes.

Without Prepare, a Byzantine primary could send Pre-prepare(“attack”) to half the replicas and Pre-prepare(“retreat”) to the other half. The Commit phase alone can’t detect this because replicas don’t cross-check what they received from the primary.

Exercise 3

Proof-of-Work acts as probabilistic BFT because:

  1. Identity is expensive: Each block requires computational work. An adversary must expend compute power proportional to their hashrate to produce blocks.
  2. Chain selection rule: The longest (most work) chain is considered canonical. To reverse a transaction, the adversary must outpace the honest chain — requiring >50% of total hashrate.
  3. Probabilistic finality: Each confirmation block reduces the probability that an adversary can rewrite history.

With 40% hashrate, an attacker has a significant but not dominant position. The probability of a successful double-spend attack decreases exponentially with confirmations:

  • After 1 block: ~40% chance (if attacker has pre-mined)
  • After 3 blocks: ~10% chance
  • After 6 blocks: ~0.8% chance
  • After 10 blocks: ~0.02% chance

The attacker can temporarily overtake the honest chain, but the honest chain grows faster on average. The system recommends waiting for enough confirmations (typically 6 for Bitcoin) to make reversal economically infeasible.

This contrasts with classical BFT where finality is instant and deterministic — once 2f+1 nodes commit, the decision is final regardless of adversarial hashrate.