Key Question
How can replicas converge to the same state without any coordination or consensus?
Deep Dive
CRDT stands for Conflict-free Replicated Data Type. It’s a data structure that multiple replicas can update independently, then merge later, and always end up in the same state — no locking, no leader election, no two-phase commit required.
The Magic: Commutative Operations
The fundamental insight: if all operations on a data type commute (A then B gives the same result as B then A), replicas can apply operations in any order and converge.
Replica A: start 0 ──[+1]──► 1 ──[+1]──► 2
Replica B: start 0 ──[+1]──► 1 ──[+1]──► 2
│ │
└────────── MERGE ─────────────┘
│
▼
Final: both see 2
Now what if order differs?
Replica A: 0 ──[+1]──► 1 ──[+2]──► 3
Replica B: 0 ──[+2]──► 2 ──[+1]──► 3
│ │
└───── MERGE ─────────────┘
│
▼
Both see 3 ✓
Addition commutes — order doesn’t matter. CRDTs generalize this property.
Two Flavors of CRDT
| Type | Name | How it works |
|---|---|---|
| State-based | CvRDT (Convergent) | Replicas exchange full state. Merge function must be commutative, associative, and idempotent. |
| Operation-based | CmRDT (Commutative) | Replicas exchange operations. Operations must commute. Requires reliable broadcast. |
State-based is simpler to implement (just exchange state and merge) but can be bandwidth-heavy (sending full state vs just the delta).
Operation-based is more efficient (send only the op) but requires reliable exactly-once delivery — if an op is lost or duplicated, convergence breaks.
CRDT vs LWW (Last Writer Wins)
A naive approach to convergence is Last Writer Wins (LWW) — each value carries a timestamp, and the highest timestamp wins. Simple, but LWW loses data:
Replica A: name = "Alice" (t=100)
Replica B: name = "Bob" (t=101) ──► merge → "Bob"
Concurrent writes:
Replica A: name = "Alex" (t=102)
Replica B: name = "Bob" (t=102) ──► tie-breaker → one is lost
A CRDT counter doesn’t lose increments. If two replicas both increment, both increments are preserved in the merged result.
Check Your Understanding
-
What does it mean for a merge function to be “commutative, associative, and idempotent”?
-
What happens if an operation is delivered twice in an operation-based CRDT (CmRDT)?
-
Why is LWW considered “lossy” compared to CRDT counters?
The “So What?”
CRDTs are the foundation of offline-first applications. Google Docs, Figma, and Notion all use CRDT-like structures to let users edit offline and sync later without conflicts. If you want local-first software that works offline and syncs seamlessly, CRDTs are the enabling technology.
✏️ Exercises
Exercises: CRDTs
Exercise 1: G-Counter Merge
A G-Counter has 3 replicas. Replica A’s state is [5, 0, 0]. Replica B’s state is [0, 3, 2].
- What is the state of each replica after they merge (A with B, B with A)?
- What is the final counter value?
- Now replica C (state
[1, 1, 1]) joins the merge. What is the final result?
Exercise 2: G-Counter Limitations
- Explain in one sentence why a G-Counter cannot represent decrements.
- Could you build a G-Counter where decrement is allowed by subtracting from the local slot? What goes wrong?
- What mathematical property of the merge function would break if you allowed arbitrary subtraction?
Exercise 3: OR-Set Concurrent Operations
Three replicas (A, B, C) start with an empty OR-Set.
- Replica A: adds “apple” (tag = a1), adds “banana” (tag = a2)
- Replica B: adds “apple” (tag = b1), removes “banana” (removes a2 — B observed A’s add)
- Replica C: removes “apple” (removes a1, b1 — C observed both)
After all replicas merge:
- Is “apple” in the set? Why or why not?
- Is “banana” in the set? Why or why not?
- How many unique tags are in the final set?
Exercise 4: CRDT vs Consensus
An e-commerce platform uses an inventory CRDT (PN-Counter) to track stock across regions. The current count for item X is 5.
- Two regions both sell one unit simultaneously (decrement by 1 each). What does the counter show after merge?
- The actual physical inventory is 5 units. After the decrements above, you sell 3 more units in region A and 2 more in region B. A customer checks the count and sees 3. They buy 3 — but the warehouse only has 2 left. How could this over-selling happen?
- Would Raft prevent this problem? What trade-off does it introduce?
👁️ View Solutions
Solutions: CRDTs
Exercise 1: G-Counter Merge
-
After A merges with B:
- A receives
[0, 3, 2]from B → elementwise max of[5, 0, 0]and[0, 3, 2]=[5, 3, 2] - B receives
[5, 0, 0]from A → elementwise max of[0, 3, 2]and[5, 0, 0]=[5, 3, 2] - Both converge to
[5, 3, 2]
- A receives
-
Final counter value: 5 + 3 + 2 = 10
-
C joins: max of
[5, 3, 2]and[1, 1, 1]:- max(5, 1) = 5
- max(3, 1) = 3
- max(2, 1) = 2
- Result:
[5, 3, 2]— C’s values were all lower, so they don’t change the outcome.
Exercise 2: G-Counter Limitations
-
A G-Counter’s merge function is elementwise max — taking the maximum of two values can never produce a smaller number, so you can never reduce any slot.
-
If you subtract from your local slot, the elementwise max during merge would take the larger previous value from another replica. Your decrement would be silently lost. For example: A has
[5, 0, 0], A decrements to[4, 0, 0], then merges with B’s[5, 0, 0](which hadn’t seen the decrement). Result:[5, 0, 0]— the decrement disappeared. -
Elementwise max is not invertible. If you allow subtraction, the merge function would need to decrease a value, which max cannot do. The lattice property (join = least upper bound) only works with monotonic increases.
Exercise 3: OR-Set Concurrent Operations
-
Is “apple” in the set? Yes. C removed tags a1 and b1. But A had already added “apple” with a1, and B had added with b1. Both are removed by C’s operation. However… wait, let’s trace carefully:
- A adds “apple” with tag a1. A adds “banana” with tag a2.
- B adds “apple” with tag b1. B removes “banana” — removes tag a2.
- C removes “apple” — removes tags a1 and b1 (that C observed).
After merge: A has {(“apple”, a1), (“banana”, a2)} ∪ {(“apple”, b1)} — but a1 and b1 were removed by C. However, the merge is a union of internal sets. C’s state has { } (it removed everything it knew about). A has {(“apple”, a1), (“banana”, a2), (“apple”, b1)} only if B’s adds propagated. But B also sent its adds.
Actually, OR-Set merge is set union of (element, tag) pairs. C’s remove of tags a1 and b1 means those tags are excluded from C’s internal set. But when A merges with C, A still has a1 and b1 (A received b1 from B). So the union brings them back. Wait — that breaks the semantics.
Correction: In an OR-Set, when you remove, you add the removed tags to a “tombstone” set (or equivalently, the internal set is
{(elem, tag)}and remove means “remove all observed (elem, tag) pairs”). But the merge is union — if another replica still has those tags, they come back.This is why real OR-Sets use “observed-remove” with version vectors or tombstones. The answer depends on implementation. For the standard OR-Set:
After all merges (union of all (e,t) pairs minus removed pairs):
- Banana’s a2 tag is removed by B, and only A had it. Gone.
- Apple’s a1 is removed by C, but B has b1 (different tag) — wait, a1 is explicitly removed, but a1 was from A. B had b1, not a1.
Let me retrace properly: after full sync:
- All tags: a1 (“apple”), a2 (“banana”), b1 (“apple”)
- Removed tags: a2 (by B), a1 and b1 (by C)
- Surviving tags: none — all three tags are in the remove set.
Final set: empty {}.
This is actually a problem with basic OR-Sets — concurrent adds and removes can delete everything. This is why production OR-Sets often use version vectors or “add wins” semantics more carefully.
-
Is “banana” in the set? No — tag a2 was removed by B, and no other replica added “banana”.
a2is in the tombstone set. -
Unique tags surviving: 0 (everything was removed).
Key insight: This exercise shows a weakness of basic OR-Sets — if a remove operation observes all tags and removes them, the element disappears. Production OR-Sets add version vectors to track causality so that concurrent adds survive.
Exercise 4: CRDT vs Consensus
-
After both regions decrement: Both decrements are recorded. Region A’s N-counter has
[1, 0], Region B’s has[0, 1]. After merge: P =[0, 0], N =[1, 1]. Value = 0 - (1 + 1) = -2. Wait — the initial value was 5, so P should track the initial value. Let me reconsider.Initial state (both regions): P =
[5, 5], N =[0, 0](each region’s P-counter has “5” for the initial stock). If each sells 1: region A decrements → N_A =[1, 0], region B → N_B =[0, 1].After merge: P =
[5, 5], N =[1, 1]. Value = (5+5) - (1+1) = 10 - 2 = 8. That’s wrong — we only had 5 units initially. This reveals a problem: each region independently initialized P to 5. The initialization is not coordinated.The real answer: A PN-Counter for inventory must be initialized on one replica and propagated. If both regions independently set P to 5, the merge results in P = max(5,5) = 5 per slot, sum(P) = 10 — double counting. This is why initial state needs coordination.
But ignoring the initialization issue: after both decrements, the counter shows 3 (5 - 1 - 1 = 3).
-
Over-selling: The counter shows 3 (a stale or concurrent read), but in reality, by the time that customer checks, 5 more units have been sold across both regions (3 + 2). The counter eventually converges to show -2, but during the sync delay, the customer saw 3 and bought 3. The system sold 10 units of a 5-unit inventory.
-
Would Raft prevent this? Yes — Raft provides strong consistency. All writes go through a leader, so inventory is always accurate. Trade-off: Writes are slower (require majority ack), and if the leader is in a different region, every write incurs cross-region latency. During a network partition, the minority partition cannot sell anything. CRDTs sacrifice accuracy (stale reads) for availability (local writes always succeed).