Decentralized Systems · DHTs Kademlia

Key Question

How does Kademlia find a node or value in O(log n) steps without knowing all nodes?

Deep Dive

Node Lookup: Iterative Deepening

Kademlia’s lookup is an iterative, parallelized crawl toward the target.

lookup(target_id):
  1. Pick α closest nodes from local k-buckets (default α=3)
  2. Send parallel FIND_NODE requests to all α
  3. Each response returns k closest nodes to target from that peer
  4. Merge results, pick the α closest unqueried nodes
  5. Repeat until no closer nodes are found
  6. Return the k closest nodes to target

Example: N=1000 nodes, looking up key = 0xdeadbeef

  Hop 1: Query 3 nodes, each returns ~20 close nodes
  Hop 2: Query closest unknown nodes from those ~60
  Hop 3: Getting warmer...
  ...
  Hop ~10: Found the k closest — done!

  log₂(1000) ≈ 10 hops. Each hop cuts distance roughly in half.

Lookup visualization (each arrow = FIND_NODE request):

  ┌──┐     ┌──┐     ┌──┐
  │Q1│────→│N1│────→│N4│
  └──┘     └──┘     └──┘
    │        │        │
    │   ┌──┐ │   ┌──┐ │
    └──→│N2│ └──→│N5│ │
        └──┘     └──┘ │
    │        │         │
    │   ┌──┐ │   ┌──┐ │
    └──→│N3│ └──→│N6│ │
        └──┘     └──┘ │
                      │
                 ┌────┘
                 ▼
              ┌──┐
              │T │  ← target node (or k closest)
              └──┘

The α Parameter

α=1: sequential — one hop at a time. Low bandwidth, high latency.
α=3 (default): three parallel queries. 3x bandwidth, but ~3x faster in practice.
α=∞: flood — degenerate into Gnutella.

The sweet spot is α=3. It gives redundancy against slow/dead nodes without wasting bandwidth.

Storage Protocol

PUT(key, value):
  1. Find the k closest nodes to key (via lookup)
  2. Send STORE(key, value) to all k
  3. Each receiver stores locally (usually in memory DB)

GET(key):
  1. Find the k closest nodes to key (via lookup)
  2. Send FIND_VALUE to all k
  3. If a node has the value → returns it immediately
  4. Otherwise acts like FIND_NODE (returns close nodes)
  5. Iterate until value found or can't get closer

Republishing

Nodes republish their key-value pairs every hour. This handles churn: if a node that stored your value goes offline, the new closest node will get it on the next republish.

Check Your Understanding

If a Kademlia network has 10,000 nodes, roughly how many hops should a lookup take?
What advantage does α=3 give over α=1 during a lookup?
Why does PUT store on k closest nodes instead of just the single closest node?

The “So What?”

The iterative, parallelized lookup is what makes Kademlia practical. It handles churn naturally — if a node goes silent, the parallel queries route around it. The same mechanism works for both node discovery and value retrieval, making the protocol trivially simple to implement.

✏️ Exercises

Exercises: DHTs & Kademlia

Exercise 1

In Kademlia with 8-bit IDs, what is the XOR distance between node IDs 11001010 and 11011010?

Exercise 2

Why does Kademlia use parallel queries (α > 1) instead of sequential queries (α = 1)? What’s the trade-off?

Exercise 3

What problem does S/Kademlia’s disjoint path routing solve, and how does it protect against an attacker who controls ~30% of the network?

Exercise 4 (Challenge)

In a Kademlia network with N = 100,000 nodes and k = 20, assume each node has a fully populated routing table. How many distinct nodes can a single node know about (maximum)? How many hops should a lookup take in the worst case?

👁️ View Solutions

Solutions: DHTs & Kademlia

Exercise 1

XOR distance between 11001010 and 11011010.

  11001010
⊕ 11011010
──────────
  00010000  = 16

The distance is 16.

Exercise 2

Parallel queries (α > 1) vs sequential.

Parallel queries (α=3) reduce lookup latency by sending requests simultaneously. If one node is slow or offline, the other α-1 requests proceed independently. The trade-off is bandwidth: each hop sends 3x as many messages. In practice this is a good trade because:

It masks network latency (RTT matters less)
It provides fault tolerance (dead nodes don’t stall the lookup)
The XOR topology means responses fill routing tables faster

Exercise 3

S/Kademlia’s disjoint path routing.

Vanilla Kademlia routes all queries toward the same k closest nodes. If an attacker controls those nodes (via a Sybil or Eclipse attack), they can return false results. S/Kademlia sends lookups along d independent paths to different sets of nodes. If the attacker controls ~30% of the network, the probability of them controlling all d paths is (0.3)^d. With d=3, the attacker has only a 2.7% chance of controlling every path, greatly reducing attack success.

Exercise 4 (Challenge)

Routing table capacity and lookup hops.

Each node has 160 k-buckets, each holding up to k=20 contacts. Maximum nodes known per node: 160 × 20 = 3,200.

In practice, many buckets will be sparsely populated (especially high-distance ranges), so a node typically knows far fewer.

Lookup hops (worst case): log₂(N) = log₂(100,000) ≈ 16.6. So 17 hops maximum.

In practice, with α=3 parallelism and well-populated routing tables, lookups converge in ~10-12 hops.