Consensus Fault Tolerance · Raft

Key Question

How do we translate the Raft paper (leader election + log replication) into actual runnable code?

Deep Dive

The raft.ts file in this directory is a simplified but runnable implementation of the Raft consensus algorithm. It simulates a 3-node cluster starting from scratch, electing a leader, and replicating log entries. Let me walk you through the code step by step.

Phase 1: The Classes and Types

First, we define the communication protocol. Raft uses two RPCs:

RequestVote: Sent by candidates during an election.
AppendEntries: Sent by the leader to replicate logs (also doubles as a heartbeat).

Each RPC has a request type and a reply type. The critical field in both is term:

interface RequestVoteArgs {
  term: number
  candidateId: number
  lastLogIndex: number
  lastLogTerm: number
}

interface AppendEntriesArgs {
  term: number
  leaderId: number
  prevLogIndex: number
  prevLogTerm: number
  entries: LogEntry[]
  leaderCommit: number
}

Terms are monotonically increasing integers that serve as a logical clock for the cluster. If a node sees a higher term, it immediately steps down to follower — this is how Raft prevents stale leaders from causing chaos.

Phase 2: Starting an Election

When a follower’s election timer expires, the node transitions to candidate and increments its currentTerm. The key rule is votesRequired = floor(N / 2) + 1. For a 3-node cluster, that’s 2 votes.

startElection() {
  this.state = 'candidate'
  this.currentTerm++
  this.votedFor = this.id

  let votesReceived = 1 // vote for self
  for (const peer of this.peers) {
    const reply = peer.handleRequestVote({
      term: this.currentTerm,
      candidateId: this.id,
      lastLogIndex: this.log.length - 1,
      lastLogTerm: this.log.length > 0
        ? this.log[this.log.length - 1].term : 0,
    })
    if (reply.voteGranted) votesReceived++
  }

  const majority = Math.floor((this.peers.length + 1) / 2) + 1
  if (votesReceived >= majority && this.state === 'candidate') {
    this.becomeLeader()
  }
}

The candidate votes for itself, then sends RequestVote to all peers. If a peer’s log is at least as up-to-date, it grants the vote.

Phase 3: The Vote Granting Logic

A follower grants a vote only if:

The candidate’s term is ≥ its current term.
It hasn’t already voted for someone else in this term.
The candidate’s log is at least as up-to-date as the follower’s log.

handleRequestVote(args: RequestVoteArgs): RequestVoteReply {
  if (args.term < this.currentTerm)
    return { term: this.currentTerm, voteGranted: false }

  if (args.term > this.currentTerm) {
    this.currentTerm = args.term
    this.state = 'follower'
    this.votedFor = null
  }

  const myLastTerm = this.log.length > 0
    ? this.log[this.log.length - 1].term : 0

  const logOk = args.lastLogTerm > myLastTerm ||
    (args.lastLogTerm === myLastTerm &&
     args.lastLogIndex >= this.log.length - 1)

  if ((this.votedFor === null || this.votedFor === args.candidateId) && logOk) {
    this.votedFor = args.candidateId
    return { term: this.currentTerm, voteGranted: true }
  }
  return { term: this.currentTerm, voteGranted: false }
}

Rule #3 is the Leader Completeness guarantee. A candidate who falls behind on log entries cannot trick the cluster into electing them.

Phase 4: Replicating Log Entries

Once the candidate wins the election, it becomes leader. In replicateCommand, the leader appends the entry, sends it to followers, and awaits a majority:

replicateCommand(command: string) {
  const entry: LogEntry = { term: this.currentTerm, command }
  this.log.push(entry)

  let successCount = 1
  for (const peer of this.peers) {
    const reply = peer.handleAppendEntries({
      term: this.currentTerm,
      leaderId: this.id,
      prevLogIndex: this.log.length - 2,
      prevLogTerm: this.log.length > 1
        ? this.log[this.log.length - 2].term : 0,
      entries: [entry],
      leaderCommit: this.commitIndex,
    })
    if (reply.success) successCount++
  }

  const majority = Math.floor((this.peers.length + 1) / 2) + 1
  if (successCount >= majority) {
    this.commitIndex = this.log.length - 1
    console.log(`Entry "${command}" COMMITTED`)
  }
}

If a follower’s log doesn’t match at the expected prevLogIndex, it rejects the request. The leader then decrements nextIndex for that follower and retries — eventually converging on the correct point.

Phase 5: The AppendEntries Handler

A follower receiving AppendEntries performs the core safety checks:

handleAppendEntries(args: AppendEntriesArgs): AppendEntriesReply {
  // 1. Reply false if term < currentTerm
  if (args.term < this.currentTerm)
    return { term: this.currentTerm, success: false }

  // 2. Step down if leader's term is higher
  if (args.term > this.currentTerm) {
    this.currentTerm = args.term
    this.state = 'follower'
  }

  // 3. Reply false if log doesn't match at prevLogIndex
  if (this.log.length <= args.prevLogIndex)
    return { term: this.currentTerm, success: false }
  if (args.prevLogIndex >= 0 &&
      this.log[args.prevLogIndex].term !== args.prevLogTerm)
    return { term: this.currentTerm, success: false }

  // 4. Append new entries
  for (const entry of args.entries) this.log.push(entry)

  // 5. Advance commitIndex
  if (args.leaderCommit > this.commitIndex)
    this.commitIndex = Math.min(args.leaderCommit, this.log.length - 1)

  return { term: this.currentTerm, success: true }
}

Key Takeaways

Terms are the backbone: Every message carries a term. Higher terms win. This single rule prevents the most common distributed systems failure (split-brain).
Majority is the key constraint: A leader can’t do anything without a majority. If the network is partitioned, the smaller partition cannot make progress.
Log matching is safety: Raft guarantees that committed entries are never lost because any future leader must have all committed entries.

Running the Code

cd lessons/03-Consensus-Fault-Tolerance/03-Raft/
npx ts-node raft.ts

Full Source

View or download the complete implementation: raft.ts

Exercises

In a 5-node Raft cluster, how many votes are needed to win an election?
What happens to a leader if it receives an AppendEntries from a node with a higher term?
Why does the candidate check that its log is “at least as up-to-date” before starting an election?

👁️ View Solutions

floor(5/2) + 1 = 3 votes.
It immediately steps down to follower and updates its currentTerm. This prevents a partitioned leader from continuing to serve stale data when it rejoins the cluster.
This is the Leader Completeness guarantee. It ensures a newly elected leader has all committed entries, so entries are never lost after a leader change.

✏️ Exercises

Raft: Exercises

Exercise 1: Quorum

A Raft cluster has 5 nodes. Two followers crash. Can the remaining 3 nodes continue to operate? Can they commit new entries? What if 3 nodes crash — what happens to the remaining 2?

Exercise 2: Election and Log Freshness

Candidate A has log [(term 1), (term 1), (term 2)]. Candidate B has log [(term 1), (term 1)]. Both are running for election. Who should win based on Raft’s election restriction, and why?

Exercise 3: Uncommitted Entries After Leader Crash

A leader commits entry 5 and has entries 6 and 7 uncommitted when it crashes. A new leader is elected. What happens to entries 6 and 7?

Exercise 4: Randomized Timeouts

Why are election timeouts randomized in Raft? What problem would occur if all nodes used the same 200ms timeout?

👁️ View Solutions

Raft: Solutions

Exercise 1

With 5 nodes, majority = 3. If 2 followers crash, 3 nodes remain — that’s a majority. The cluster can elect a leader, commit entries, and continue operating normally.

If 3 nodes crash, only 2 remain. 2 is not a majority of 5. The remaining 2 nodes cannot elect a leader (no candidate can get 3 votes). They also cannot commit new entries. However, if they were previously part of a cluster with a leader, the leader (if among the 2) can continue sending heartbeats but cannot commit anything. The cluster is effectively read-only until connectivity to other nodes is restored.

This is why Raft clusters are typically deployed with odd numbers of nodes (3, 5, 7) — to maximize fault tolerance.

Exercise 2

Candidate A should win. The election restriction compares last log entries:

Candidate A: last entry is (term 2, index 3)
Candidate B: last entry is (term 1, index 2)

Comparison: term 2 > term 1. A’s log is more up-to-date. Voters will vote for A over B.

This is true even though B’s log (index 2) is shorter than A’s (index 3). The term comparison takes priority. A’s log has entries from a newer term, which might be needed for correctness.

Exercise 3

Entries 6 and 7 become uncommitted and will be overwritten by the new leader. The new leader starts with its own log (which doesn’t include entries 6 and 7, since they weren’t on a majority of nodes). The new leader will:

Accept new client requests.
Fill any gaps in its log through AppendEntries.
Eventually, entries 6 and 7’s slots will be filled with new commands from the new leader’s term.

This is safe because entries 6 and 7 were never committed — no client received confirmation for them. If a client sent those commands and didn’t get a response, it will retry with the new leader.

Exercise 4

If all nodes used the same 200ms timeout, they would all time out simultaneously and all become candidates at the same time. This would cause an election where every node votes for itself — resulting in a guaranteed split vote with no majority. The election fails, all timers reset and immediately fire again (since they’re still synchronized), creating an infinite loop of failed elections.

Randomization (150-300ms) ensures that one node almost always times out first, giving it the chance to start an election and win before others become candidates. The expected wait for the first timeout in a 5-node cluster is ~30ms — far shorter than any single timeout value.