Distributed & Decentralized Systems Curriculum
Decentralized Systems Β· DHTs Kademlia

Key Question

How does Kademlia find a node or value in O(log n) steps without knowing all nodes?

Deep Dive

Node Lookup: Iterative Deepening

Kademlia’s lookup is an iterative, parallelized crawl toward the target.

lookup(target_id):
  1. Pick Ξ± closest nodes from local k-buckets (default Ξ±=3)
  2. Send parallel FIND_NODE requests to all Ξ±
  3. Each response returns k closest nodes to target from that peer
  4. Merge results, pick the Ξ± closest unqueried nodes
  5. Repeat until no closer nodes are found
  6. Return the k closest nodes to target

Example: N=1000 nodes, looking up key = 0xdeadbeef

  Hop 1: Query 3 nodes, each returns ~20 close nodes
  Hop 2: Query closest unknown nodes from those ~60
  Hop 3: Getting warmer...
  ...
  Hop ~10: Found the k closest β€” done!

  logβ‚‚(1000) β‰ˆ 10 hops. Each hop cuts distance roughly in half.
Lookup visualization (each arrow = FIND_NODE request):

  β”Œβ”€β”€β”     β”Œβ”€β”€β”     β”Œβ”€β”€β”
  β”‚Q1│────→│N1│────→│N4β”‚
  β””β”€β”€β”˜     β””β”€β”€β”˜     β””β”€β”€β”˜
    β”‚        β”‚        β”‚
    β”‚   β”Œβ”€β”€β” β”‚   β”Œβ”€β”€β” β”‚
    └──→│N2β”‚ └──→│N5β”‚ β”‚
        β””β”€β”€β”˜     β””β”€β”€β”˜ β”‚
    β”‚        β”‚         β”‚
    β”‚   β”Œβ”€β”€β” β”‚   β”Œβ”€β”€β” β”‚
    └──→│N3β”‚ └──→│N6β”‚ β”‚
        β””β”€β”€β”˜     β””β”€β”€β”˜ β”‚
                      β”‚
                 β”Œβ”€β”€β”€β”€β”˜
                 β–Ό
              β”Œβ”€β”€β”
              β”‚T β”‚  ← target node (or k closest)
              β””β”€β”€β”˜

The Ξ± Parameter

  • Ξ±=1: sequential β€” one hop at a time. Low bandwidth, high latency.
  • Ξ±=3 (default): three parallel queries. 3x bandwidth, but ~3x faster in practice.
  • Ξ±=∞: flood β€” degenerate into Gnutella.

The sweet spot is Ξ±=3. It gives redundancy against slow/dead nodes without wasting bandwidth.

Storage Protocol

PUT(key, value):
  1. Find the k closest nodes to key (via lookup)
  2. Send STORE(key, value) to all k
  3. Each receiver stores locally (usually in memory DB)

GET(key):
  1. Find the k closest nodes to key (via lookup)
  2. Send FIND_VALUE to all k
  3. If a node has the value β†’ returns it immediately
  4. Otherwise acts like FIND_NODE (returns close nodes)
  5. Iterate until value found or can't get closer

Republishing

Nodes republish their key-value pairs every hour. This handles churn: if a node that stored your value goes offline, the new closest node will get it on the next republish.

Check Your Understanding

  1. If a Kademlia network has 10,000 nodes, roughly how many hops should a lookup take?
  2. What advantage does Ξ±=3 give over Ξ±=1 during a lookup?
  3. Why does PUT store on k closest nodes instead of just the single closest node?

The β€œSo What?”

The iterative, parallelized lookup is what makes Kademlia practical. It handles churn naturally β€” if a node goes silent, the parallel queries route around it. The same mechanism works for both node discovery and value retrieval, making the protocol trivially simple to implement.


✏️ Exercises

Exercises: DHTs & Kademlia

Exercise 1

In Kademlia with 8-bit IDs, what is the XOR distance between node IDs 11001010 and 11011010?

Exercise 2

Why does Kademlia use parallel queries (Ξ± > 1) instead of sequential queries (Ξ± = 1)? What’s the trade-off?

Exercise 3

What problem does S/Kademlia’s disjoint path routing solve, and how does it protect against an attacker who controls ~30% of the network?

Exercise 4 (Challenge)

In a Kademlia network with N = 100,000 nodes and k = 20, assume each node has a fully populated routing table. How many distinct nodes can a single node know about (maximum)? How many hops should a lookup take in the worst case?

πŸ‘οΈ View Solutions

Solutions: DHTs & Kademlia

Exercise 1

XOR distance between 11001010 and 11011010.

  11001010
βŠ• 11011010
──────────
  00010000  = 16

The distance is 16.

Exercise 2

Parallel queries (Ξ± > 1) vs sequential.

Parallel queries (Ξ±=3) reduce lookup latency by sending requests simultaneously. If one node is slow or offline, the other Ξ±-1 requests proceed independently. The trade-off is bandwidth: each hop sends 3x as many messages. In practice this is a good trade because:

  • It masks network latency (RTT matters less)
  • It provides fault tolerance (dead nodes don’t stall the lookup)
  • The XOR topology means responses fill routing tables faster

Exercise 3

S/Kademlia’s disjoint path routing.

Vanilla Kademlia routes all queries toward the same k closest nodes. If an attacker controls those nodes (via a Sybil or Eclipse attack), they can return false results. S/Kademlia sends lookups along d independent paths to different sets of nodes. If the attacker controls ~30% of the network, the probability of them controlling all d paths is (0.3)^d. With d=3, the attacker has only a 2.7% chance of controlling every path, greatly reducing attack success.

Exercise 4 (Challenge)

Routing table capacity and lookup hops.

Each node has 160 k-buckets, each holding up to k=20 contacts. Maximum nodes known per node: 160 Γ— 20 = 3,200.

In practice, many buckets will be sparsely populated (especially high-distance ranges), so a node typically knows far fewer.

Lookup hops (worst case): logβ‚‚(N) = logβ‚‚(100,000) β‰ˆ 16.6. So 17 hops maximum.

In practice, with Ξ±=3 parallelism and well-populated routing tables, lookups converge in ~10-12 hops.