Key Question
How does Kademlia find a node or value in O(log n) steps without knowing all nodes?
Deep Dive
Node Lookup: Iterative Deepening
Kademliaβs lookup is an iterative, parallelized crawl toward the target.
lookup(target_id):
1. Pick Ξ± closest nodes from local k-buckets (default Ξ±=3)
2. Send parallel FIND_NODE requests to all Ξ±
3. Each response returns k closest nodes to target from that peer
4. Merge results, pick the Ξ± closest unqueried nodes
5. Repeat until no closer nodes are found
6. Return the k closest nodes to target
Example: N=1000 nodes, looking up key = 0xdeadbeef
Hop 1: Query 3 nodes, each returns ~20 close nodes
Hop 2: Query closest unknown nodes from those ~60
Hop 3: Getting warmer...
...
Hop ~10: Found the k closest β done!
logβ(1000) β 10 hops. Each hop cuts distance roughly in half.
Lookup visualization (each arrow = FIND_NODE request):
ββββ ββββ ββββ
βQ1βββββββN1βββββββN4β
ββββ ββββ ββββ
β β β
β ββββ β ββββ β
βββββN2β βββββN5β β
ββββ ββββ β
β β β
β ββββ β ββββ β
βββββN3β βββββN6β β
ββββ ββββ β
β
ββββββ
βΌ
ββββ
βT β β target node (or k closest)
ββββ
The Ξ± Parameter
- Ξ±=1: sequential β one hop at a time. Low bandwidth, high latency.
- Ξ±=3 (default): three parallel queries. 3x bandwidth, but ~3x faster in practice.
- Ξ±=β: flood β degenerate into Gnutella.
The sweet spot is Ξ±=3. It gives redundancy against slow/dead nodes without wasting bandwidth.
Storage Protocol
PUT(key, value):
1. Find the k closest nodes to key (via lookup)
2. Send STORE(key, value) to all k
3. Each receiver stores locally (usually in memory DB)
GET(key):
1. Find the k closest nodes to key (via lookup)
2. Send FIND_VALUE to all k
3. If a node has the value β returns it immediately
4. Otherwise acts like FIND_NODE (returns close nodes)
5. Iterate until value found or can't get closer
Republishing
Nodes republish their key-value pairs every hour. This handles churn: if a node that stored your value goes offline, the new closest node will get it on the next republish.
Check Your Understanding
- If a Kademlia network has 10,000 nodes, roughly how many hops should a lookup take?
- What advantage does Ξ±=3 give over Ξ±=1 during a lookup?
- Why does PUT store on k closest nodes instead of just the single closest node?
The βSo What?β
The iterative, parallelized lookup is what makes Kademlia practical. It handles churn naturally β if a node goes silent, the parallel queries route around it. The same mechanism works for both node discovery and value retrieval, making the protocol trivially simple to implement.
βοΈ Exercises
Exercises: DHTs & Kademlia
Exercise 1
In Kademlia with 8-bit IDs, what is the XOR distance between node IDs 11001010 and 11011010?
Exercise 2
Why does Kademlia use parallel queries (Ξ± > 1) instead of sequential queries (Ξ± = 1)? Whatβs the trade-off?
Exercise 3
What problem does S/Kademliaβs disjoint path routing solve, and how does it protect against an attacker who controls ~30% of the network?
Exercise 4 (Challenge)
In a Kademlia network with N = 100,000 nodes and k = 20, assume each node has a fully populated routing table. How many distinct nodes can a single node know about (maximum)? How many hops should a lookup take in the worst case?
ποΈ View Solutions
Solutions: DHTs & Kademlia
Exercise 1
XOR distance between 11001010 and 11011010.
11001010
β 11011010
ββββββββββ
00010000 = 16
The distance is 16.
Exercise 2
Parallel queries (Ξ± > 1) vs sequential.
Parallel queries (Ξ±=3) reduce lookup latency by sending requests simultaneously. If one node is slow or offline, the other Ξ±-1 requests proceed independently. The trade-off is bandwidth: each hop sends 3x as many messages. In practice this is a good trade because:
- It masks network latency (RTT matters less)
- It provides fault tolerance (dead nodes donβt stall the lookup)
- The XOR topology means responses fill routing tables faster
Exercise 3
S/Kademliaβs disjoint path routing.
Vanilla Kademlia routes all queries toward the same k closest nodes. If an attacker controls those nodes (via a Sybil or Eclipse attack), they can return false results. S/Kademlia sends lookups along d independent paths to different sets of nodes. If the attacker controls ~30% of the network, the probability of them controlling all d paths is (0.3)^d. With d=3, the attacker has only a 2.7% chance of controlling every path, greatly reducing attack success.
Exercise 4 (Challenge)
Routing table capacity and lookup hops.
Each node has 160 k-buckets, each holding up to k=20 contacts. Maximum nodes known per node: 160 Γ 20 = 3,200.
In practice, many buckets will be sparsely populated (especially high-distance ranges), so a node typically knows far fewer.
Lookup hops (worst case): logβ(N) = logβ(100,000) β 16.6. So 17 hops maximum.
In practice, with Ξ±=3 parallelism and well-populated routing tables, lookups converge in ~10-12 hops.