Distributed & Decentralized Systems Curriculum
Decentralized Systems Β· DHTs Kademlia

Key Question

How does Kademlia use XOR as a distance metric to build a self-organizing routing table?

Deep Dive

Kademlia (Maymounkov & Mazières, 2002) assigns each node a 160-bit ID, typically SHA-1(IP, port). The game-changer: distance is XOR, interpreted as an integer.

XOR distance: d(a, b) = a βŠ• b

Example (8-bit IDs):
  Node A: 11001010  (202)
  Node B: 11011010  (218)
  ───────────────────────
  XOR:     00010000  (16)

  d(A, B) = 16

XOR has beautiful properties for routing:

  • d(x, x) = 0 β€” a node is zero distance from itself.
  • d(x, y) = d(y, x) β€” symmetric. Chord’s clockwise distance wasn’t; this matters for reinforcing routing info in both directions.
  • Triangle inequality: d(x, z) ≀ d(x, y) + d(y, z). The metric space is well-behaved.

k-Buckets: The Routing Table

Each node maintains up to 160 k-buckets. For each bit position i (0 to 159), one bucket stores up to k contacts whose IDs have distance between 2^i and 2^(i+1) from the local node.

Local node ID: 1100  (12)

Distance ranges:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Bucket β”‚ Distance range   β”‚ Node IDs in range            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ i=0    β”‚ 1 (2^0)          β”‚ 1101 (13) β€” differs in bit 0 β”‚
β”‚ i=1    β”‚ 2-3 (2^1-2^2-1)  β”‚ 1110 (14), 1111 (15)         β”‚
β”‚ i=2    β”‚ 4-7 (2^2-2^3-1)  β”‚ 1000 (8)..1011 (11)          β”‚
β”‚ i=3    β”‚ 8-15 (2^3-2^4-1) β”‚ 0100 (4)..1011 (11)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Typical k=20. A bucket is β€œfull” when it has k contacts. When a new node is discovered, it goes into the correct bucket. If the bucket is full and an existing contact doesn’t respond to a ping, the new one replaces it. If the existing one responds, the new one is discarded β€” this makes Kademlia resistant to flooding.

Compare with Chord’s Finger Table

Chord uses rigid intervals: finger[i] points to the node at (n + 2^i) mod 2^m. You must use that specific node. Kademlia’s buckets can contain any node within the distance range β€” giving flexibility in which nodes to query.

Chord:                        Kademlia:
Finger[0] β†’ n+1               Bucket[0] β†’ any node at distance 1
Finger[1] β†’ n+2               Bucket[1] β†’ any node at distance 2-3
Finger[2] β†’ n+4               Bucket[2] β†’ any node at distance 4-7
...                            ...
                               Can pick the closest or fastest
                               from each bucket.

The XOR topology means every lookup request and response fills in routing information for both nodes β€” the network is self-teaching.

Check Your Understanding

  1. Compute XOR distance between 10110101 and 10100101.
  2. Why is the symmetry of XOR distance (d(x,y) = d(y,x)) useful for routing table maintenance?
  3. What happens if a new node tries to join a full k-bucket and the oldest contact is still alive?

The β€œSo What?”

The XOR metric is what makes Kademlia elegant. It’s symmetric (unlike Chord), which means every query routes info both ways. The k-bucket structure naturally keeps the closest, most reliable nodes while evicting dead ones β€” zero central coordination needed.


✏️ Exercises

Exercises: DHTs & Kademlia

Exercise 1

In Kademlia with 8-bit IDs, what is the XOR distance between node IDs 11001010 and 11011010?

Exercise 2

Why does Kademlia use parallel queries (Ξ± > 1) instead of sequential queries (Ξ± = 1)? What’s the trade-off?

Exercise 3

What problem does S/Kademlia’s disjoint path routing solve, and how does it protect against an attacker who controls ~30% of the network?

Exercise 4 (Challenge)

In a Kademlia network with N = 100,000 nodes and k = 20, assume each node has a fully populated routing table. How many distinct nodes can a single node know about (maximum)? How many hops should a lookup take in the worst case?

πŸ‘οΈ View Solutions

Solutions: DHTs & Kademlia

Exercise 1

XOR distance between 11001010 and 11011010.

  11001010
βŠ• 11011010
──────────
  00010000  = 16

The distance is 16.

Exercise 2

Parallel queries (Ξ± > 1) vs sequential.

Parallel queries (Ξ±=3) reduce lookup latency by sending requests simultaneously. If one node is slow or offline, the other Ξ±-1 requests proceed independently. The trade-off is bandwidth: each hop sends 3x as many messages. In practice this is a good trade because:

  • It masks network latency (RTT matters less)
  • It provides fault tolerance (dead nodes don’t stall the lookup)
  • The XOR topology means responses fill routing tables faster

Exercise 3

S/Kademlia’s disjoint path routing.

Vanilla Kademlia routes all queries toward the same k closest nodes. If an attacker controls those nodes (via a Sybil or Eclipse attack), they can return false results. S/Kademlia sends lookups along d independent paths to different sets of nodes. If the attacker controls ~30% of the network, the probability of them controlling all d paths is (0.3)^d. With d=3, the attacker has only a 2.7% chance of controlling every path, greatly reducing attack success.

Exercise 4 (Challenge)

Routing table capacity and lookup hops.

Each node has 160 k-buckets, each holding up to k=20 contacts. Maximum nodes known per node: 160 Γ— 20 = 3,200.

In practice, many buckets will be sparsely populated (especially high-distance ranges), so a node typically knows far fewer.

Lookup hops (worst case): logβ‚‚(N) = logβ‚‚(100,000) β‰ˆ 16.6. So 17 hops maximum.

In practice, with Ξ±=3 parallelism and well-populated routing tables, lookups converge in ~10-12 hops.