Decentralized Systems · DHTs Kademlia

Key Question

How does Kademlia use XOR as a distance metric to build a self-organizing routing table?

Deep Dive

Kademlia (Maymounkov & Mazières, 2002) assigns each node a 160-bit ID, typically SHA-1(IP, port). The game-changer: distance is XOR, interpreted as an integer.

XOR distance: d(a, b) = a ⊕ b

Example (8-bit IDs):
  Node A: 11001010  (202)
  Node B: 11011010  (218)
  ───────────────────────
  XOR:     00010000  (16)

  d(A, B) = 16

XOR has beautiful properties for routing:

d(x, x) = 0 — a node is zero distance from itself.
d(x, y) = d(y, x) — symmetric. Chord’s clockwise distance wasn’t; this matters for reinforcing routing info in both directions.
Triangle inequality: d(x, z) ≤ d(x, y) + d(y, z). The metric space is well-behaved.

k-Buckets: The Routing Table

Each node maintains up to 160 k-buckets. For each bit position i (0 to 159), one bucket stores up to k contacts whose IDs have distance between 2^i and 2^(i+1) from the local node.

Local node ID: 1100  (12)

Distance ranges:
┌────────┬─────────────────────────────────┬──────────────────────┐
│ Bucket │ Distance range   │ Node IDs in range            │
├────────┼─────────────────────────────────┼──────────────────────┤
│ i=0    │ 1 (2^0)          │ 1101 (13) — differs in bit 0 │
│ i=1    │ 2-3 (2^1-2^2-1)  │ 1110 (14), 1111 (15)         │
│ i=2    │ 4-7 (2^2-2^3-1)  │ 1000 (8)..1011 (11)          │
│ i=3    │ 8-15 (2^3-2^4-1) │ 0100 (4)..1011 (11)          │
└────────┴─────────────────────────────────┴──────────────────────┘

Typical k=20. A bucket is “full” when it has k contacts. When a new node is discovered, it goes into the correct bucket. If the bucket is full and an existing contact doesn’t respond to a ping, the new one replaces it. If the existing one responds, the new one is discarded — this makes Kademlia resistant to flooding.

Compare with Chord’s Finger Table

Chord uses rigid intervals: finger[i] points to the node at (n + 2^i) mod 2^m. You must use that specific node. Kademlia’s buckets can contain any node within the distance range — giving flexibility in which nodes to query.

Chord:                        Kademlia:
Finger[0] → n+1               Bucket[0] → any node at distance 1
Finger[1] → n+2               Bucket[1] → any node at distance 2-3
Finger[2] → n+4               Bucket[2] → any node at distance 4-7
...                            ...
                               Can pick the closest or fastest
                               from each bucket.

The XOR topology means every lookup request and response fills in routing information for both nodes — the network is self-teaching.

Check Your Understanding

Compute XOR distance between 10110101 and 10100101.
Why is the symmetry of XOR distance (d(x,y) = d(y,x)) useful for routing table maintenance?
What happens if a new node tries to join a full k-bucket and the oldest contact is still alive?

The “So What?”

The XOR metric is what makes Kademlia elegant. It’s symmetric (unlike Chord), which means every query routes info both ways. The k-bucket structure naturally keeps the closest, most reliable nodes while evicting dead ones — zero central coordination needed.

✏️ Exercises

Exercises: DHTs & Kademlia

Exercise 1

In Kademlia with 8-bit IDs, what is the XOR distance between node IDs 11001010 and 11011010?

Exercise 2

Why does Kademlia use parallel queries (α > 1) instead of sequential queries (α = 1)? What’s the trade-off?

Exercise 3

What problem does S/Kademlia’s disjoint path routing solve, and how does it protect against an attacker who controls ~30% of the network?

Exercise 4 (Challenge)

In a Kademlia network with N = 100,000 nodes and k = 20, assume each node has a fully populated routing table. How many distinct nodes can a single node know about (maximum)? How many hops should a lookup take in the worst case?

👁️ View Solutions

Solutions: DHTs & Kademlia

Exercise 1

XOR distance between 11001010 and 11011010.

  11001010
⊕ 11011010
──────────
  00010000  = 16

The distance is 16.

Exercise 2

Parallel queries (α > 1) vs sequential.

Parallel queries (α=3) reduce lookup latency by sending requests simultaneously. If one node is slow or offline, the other α-1 requests proceed independently. The trade-off is bandwidth: each hop sends 3x as many messages. In practice this is a good trade because:

It masks network latency (RTT matters less)
It provides fault tolerance (dead nodes don’t stall the lookup)
The XOR topology means responses fill routing tables faster

Exercise 3

S/Kademlia’s disjoint path routing.

Vanilla Kademlia routes all queries toward the same k closest nodes. If an attacker controls those nodes (via a Sybil or Eclipse attack), they can return false results. S/Kademlia sends lookups along d independent paths to different sets of nodes. If the attacker controls ~30% of the network, the probability of them controlling all d paths is (0.3)^d. With d=3, the attacker has only a 2.7% chance of controlling every path, greatly reducing attack success.

Exercise 4 (Challenge)

Routing table capacity and lookup hops.

Each node has 160 k-buckets, each holding up to k=20 contacts. Maximum nodes known per node: 160 × 20 = 3,200.

In practice, many buckets will be sparsely populated (especially high-distance ranges), so a node typically knows far fewer.

Lookup hops (worst case): log₂(N) = log₂(100,000) ≈ 16.6. So 17 hops maximum.

In practice, with α=3 parallelism and well-populated routing tables, lookups converge in ~10-12 hops.