Key Question
How does Dynamo balance read and write performance using configurable quorums?
Deep Dive
Dynamo gives you three knobs: N (replication factor), R (minimum read responses), and W (minimum write responses). These three knobs let you dial in your exact position on the consistency-availability-latency trade-off spectrum.
The Three Knobs
| Parameter | Meaning | Default | Range |
|---|---|---|---|
| N | Number of replicas for each key | 3 | 1 - โ (practical max ~5) |
| R | Minimum nodes for a successful read | 2 | 1 - N |
| W | Minimum nodes for a successful write | 2 | 1 - N |
Read: coordinator sends read request to all N replicas. Waits for R responses. Write: coordinator sends write to all N replicas. Waits for W acknowledgments.
Quorum Intersection (R + W > N)
The magic happens when R + W > N. This means any read quorum overlaps with any write quorum on at least one node. That node will have the latest write, guaranteeing strong consistency.
Example: N = 3, R = 2, W = 2
Ring holds 3 replicas of key K: [A, B, C]
Write: coordinator sends to A, B, C
Waits for 2 acknowledgments (W=2)
Suppose A and B acknowledge. Last write = "Version 2"
Read: coordinator sends to A, B, C
Waits for 2 responses (R=2)
Suppose B and C respond.
B has Version 2 (latest). C has Version 1 (stale).
R=2 means B's response is included.
Coordinator returns Version 2 (latest).
Why? Because B is in BOTH the write quorum {A, B} and
the read quorum {B, C}. The intersection is non-empty.
If R + W <= N, overlap is NOT guaranteed:
N=3, R=1, W=1:
Write quorum could be {A}. Read quorum could be {B}.
No overlap โ read could get stale data.
R + W = N + 1: Strong Consistency
With R + W = N + 1 (e.g., N=3, R=2, W=2), you get strong consistency (linearizability): every read returns the latest write. This is the Dynamo default.
R + W <= N: Eventual Consistency
With R + W <= N (e.g., N=3, R=1, W=1), you get eventual consistency: reads may return stale data, but eventually all replicas will converge. The trade-off is lower latency and higher availability.
Tuning for Different Workloads
Workload Type N R W Consistency Latency
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Read-heavy (catalog) 3 1 3 Weak consistent Fast reads
Write-heavy (logging) 3 3 1 Weak consistent Fast writes
Balanced (default) 3 2 2 Strong consistent Balanced
High durability (finance) 5 3 3 Strong consistent Higher latency
Read-heavy: R=1 means the coordinator returns the first response it gets. Very fast reads, but potentially stale. W=3 means all replicas must acknowledge every write โ writes are slower.
Write-heavy: W=1 means the coordinator returns after one acknowledgment. Very fast writes. But now R=3 means all replicas must respond for reads โ reads are slower.
Practical Implications
Dynamo (and DynamoDB) let you configure these per-operation or per-table. In Amazonโs production environment:
- Shopping cart service: N=3, R=2, W=2 (strong consistency for cart operations).
- Session state service: N=3, R=1, W=2 (fast reads, reliable writes; stale sessions are tolerable).
- Product catalog: N=3, R=1, W=1 (eventual consistency; propagation delay acceptable).
What W and R Really Cost
Each replica you wait for adds latency equal to the slowest of those replicas. With W=3, you wait for all three replicas โ latency = max(A, B, C). With W=1, latency = min(A, B, C) โ essentially the fastest replica. At scale, the difference between โfastest replicaโ and โthird fastest replicaโ can be tens of milliseconds due to network variance and load.
Check Your Understanding
-
N = 3, R = 1, W = 3. A write quorum is {A, B, C}. A read quorum is {A}. Can the read return stale data? Why or why not?
-
N = 5, R = 3, W = 3. A write is acknowledged by A, B, C. Later, a read gets responses from D, E, A. Is the read guaranteed to see the latest write?
-
What happens if R + W > N but a node in the quorum intersection has crashed and lost data? Can the read still be consistent?
The โSo What?โ
Tuning (N, R, W) is how DynamoDB users control the cost/latency/consistency trade-off for each table. When you set โstrongly consistent readsโ in DynamoDB, the system uses R=1 (for eventually consistent) vs. dynamically determining the right quorum for strong consistency. Understanding quorums lets you make informed trade-offs between performance and correctness in any quorum-based system.
โ๏ธ Exercises
Dynamo: Exercises
-
Quorum consistency. N = 3, R = 1, W = 3. Under what conditions can a read return stale data? What happens to the latency of writes and reads in this configuration?
-
Hinted handoff durability. In a sloppy quorum scenario, node A is down. The coordinator writes key K to B, C, D (D holds hinted data for A). Then D crashes before handing off to A. B and C remain healthy. Node A recovers. How many copies of key K exist? Is any data lost?
-
Merkle tree depth. A Dynamo node stores 16 million keys. It builds a Merkle tree of depth 16 (65,536 leaves). What is the average number of keys per bucket? If the root hashes differ between two nodes, how many hash comparisons are needed in the worst case to find the differing bucket?
-
Dynamo vs. GFS. GFS uses a single master with lease-based replication. Dynamo uses a fully distributed ring with quorums. Why did Amazon choose the Dynamo architecture instead of a GFS-like master-based replication? What constraints drove this decision?
๐๏ธ View Solutions
Dynamo: Solutions
1. Quorum consistency
Answer. N = 3, R = 1, W = 3. With R = 1, the read quorum is a single node. With W = 3, the write quorum is all three nodes (A, B, C). Since R + W = 4 > N = 3, any read quorum overlaps with any write quorum. The read is guaranteed to see the latest write because the single node that responds (R=1) was part of the write quorum of 3 (W=3). So the read always returns the latest data.
Write latency: the coordinator must wait for all 3 replicas to acknowledge (W=3). Write latency = max(A, B, C) โ the slowest replica. Writes are slower.
Read latency: the coordinator returns as soon as any 1 replica responds (R=1). Read latency = min(A, B, C). Reads are fast. However, the data could be stale if the coordinator doesnโt pick the right replica (but with W=3, all replicas are up-to-date, so this is not an issue).
2. Hinted handoff durability
Answer. Timeline:
- A is down. Write goes to B, C, D (D holds hinted copy for A).
- D crashes. Dโs local store (including the hinted data for A) is lost.
- B and C have the data. A is back up but doesnโt have the latest write.
Number of copies: B and C have the data (2 copies). The hinted copy on D is lost. So only 2 copies remain. If B or C also fails, the data could be lost permanently.
Is data lost? Yes, the latest write is partially lost โ specifically, the copy intended for A (which was hinted to D) is gone when D crashed. However, B and C still have the data, so the data is not completely lost. The system has N=3 but only 2 copies survive. Durability is temporarily reduced. When anti-entropy runs (or hinted handoff from another node), it will detect that A is missing the latest version and repair it.
This is the trade-off: sloppy quorums improve availability (writes always succeed) at the cost of temporarily reduced durability during cascading failures.
3. Merkle tree depth
Answer.
Average keys per bucket: 16,000,000 / 65,536 โ 244 keys per bucket.
Worst-case hash comparisons to find differing bucket: at each level of the tree, we compare 2 hashes (the two children of the differing node). Depth 16 means 16 levels. At each level, we compare 2 hashes. So 16 ร 2 = 32 hash comparisons in the worst case. This is O(log L) where L = 65,536 leaves.
After finding the bucket, we transfer all keys in that bucket (โ244 keys) to identify the exact differing key-value pairs.
Contrast with a full comparison: without Merkle trees, weโd need to transfer or compare all 16 million keys (each key-value pair sent over the network or compared locally). The Merkle tree reduces this to 32 hashes + 244 keys โ a 99.998% reduction in data transfer.
4. Dynamo vs. GFS
Answer. Amazon chose the Dynamo architecture for these reasons:
-
Availability requirements. Amazonโs shopping cart must always accept writes. If a customer adds an item to their cart, it must succeed. GFSโs single master creates an availability bottleneck. Dynamo achieves 99.999% write availability through its fully distributed, masterless design.
-
No single point of failure. Dynamo has no master. GFS has a single master (SPOF). For Amazonโs SLA-driven services, any SPOF is unacceptable. The Dynamo ring means any node can fail without affecting availability โ the system degrades gracefully.
-
Operational simplicity. Thereโs no master to elect, promote, or failover. Nodes join and leave the ring independently. This makes operations (deployments, scaling) much simpler than managing a GFS-like master.
-
Workload characteristics. GFS is optimized for large files and sequential access (search index, MapReduce). Dynamo is optimized for small objects and random access (shopping cart, session state). Dynamoโs quorum system is better suited to key-value workloads than GFSโs chunk-based approach.
-
Consistency model flexibility. Dynamoโs tunable consistency (N, R, W) lets each service pick its trade-off. GFS always provides strong consistency for single-chunk operations. Not every Amazon service needs strong consistency.
-
No cross-rack file system dependencies. Dynamo works directly on local disks. GFS depends on a shared file system abstraction. Dynamoโs simpler local-storage model is more portable and easier to manage at Amazonโs scale.
In short: GFS optimizes for throughput (large sequential operations). Dynamo optimizes for availability (small random operations). They were designed for different workloads.