Key Question
How do you build a CRDT for sets and registers used in collaborative editing?
Deep Dive
Counters are simple. Real applications need richer data types — registers (a single value), sets, maps, and lists. Two of the most practical are the LWW-Register and the OR-Set.
LWW-Register (Last Writer Wins Register)
A register stores a single value. The naive approach: timestamp + value, latest timestamp wins.
State: (timestamp, value)
Replica A writes "hello" at t=10:
A: (10, "hello")
Replica B writes "world" at t=20:
B: (20, "world")
Merge: pick highest timestamp → (20, "world") ✓
Problem — concurrent writes are lost:
Replica A writes "hello" at t=15
Replica B writes "world" at t=15 (same clock)
Merge: tie-break by replica ID → "world" wins
"hello" is lost forever.
LWW-Registers always converge, but they drop data. Acceptable when the latest write is authoritative (last edit in a document). Bad when every write matters (likes, votes).
OR-Set (Observed-Remove Set)
A set where elements can be added and removed without losing concurrent operations. The key insight: add beats remove.
How it works:
Each element in the set has a unique tag (typically a UUID generated by the adding replica). The set internally stores {(element, tag)} pairs.
add(element)— generates a new unique tag, adds(element, tag)to the setremove(element)— removes all tags for that element that this replica has observedlookup(element)— true if any(element, tag)existsmerge(other)— union of both internal tag sets
Initial state: both replicas have empty set {}
Alice adds "celery":
A: {("celery", tag_A1)}
Alice adds "carrot":
A: {("celery", tag_A1), ("carrot", tag_A2)}
Bob adds "carrot" (concurrent with Alice):
B: {("carrot", tag_B1)}
Bob removes "celery" (removes tag_A1):
B: {} (Bob never had "celery", but observes A1's removal)
Now network connects and A and B merge:
Merge A ∪ B:
A: {("celery", tag_A1), ("carrot", tag_A2), ("carrot", tag_B1)}
B: {("carrot", tag_B1), ("carrot", tag_A2)}
Both replicas converge to the same state ✓
What happened? Alice’s “celery” added tag_A1. Bob’s remove removed tag_A1 (the only tag Bob had observed). But Alice concurrently added “carrot” with tag_A2, which Bob never observed. So tag_A2 survives — “carrot” stays in the set. Bob’s own “carrot” (tag_B1) also survives.
What about concurrent add and remove of the same element?
Alice adds "onion" with tag_A3.
Bob (who hasn't seen tag_A3) removes "onion".
They merge:
Alice has {("onion", tag_A3)}
Bob has removed all tags he knows about for "onion" (none, since he hadn't seen tag_A3)
After merge: "onion" is present (tag_A3 survives)
The remove only removes tags the remover has observed. If a concurrent add generated a tag the remover never saw, that tag stays. Add wins over concurrent remove.
Comparison
| Property | LWW-Register | OR-Set |
|---|---|---|
| Converges | Yes | Yes |
| Loses concurrent writes | Yes | No |
| Size | O(1) | O(n) per element |
| Works for | Last-write-wins values | Collaborative lists, shopping carts |
Check Your Understanding
-
In an OR-Set, two replicas both add “apple” at the same time, then merge. How many tags for “apple” exist after the merge?
-
Why does the OR-Set need unique tags per add operation? What would break if two adds of the same element reused the same tag?
-
An LWW-Register stores
(t=100, value="X")on replica A and(t=100, value="Y")on replica B. They merge. Which value survives? Who decides?
The “So What?”
OR-Sets are the foundation of collaborative applications. Google Docs’ internal data model is essentially a list of characters with OR-Set semantics — each character has a unique ID, concurrent insertions both survive. Shopping cart CRDTs in Redis Enterprise use OR-Sets so that concurrent adds of the same item don’t conflict. If you’re building any application with offline edits, OR-Sets are your starting point.
✏️ Exercises
Exercises: CRDTs
Exercise 1: G-Counter Merge
A G-Counter has 3 replicas. Replica A’s state is [5, 0, 0]. Replica B’s state is [0, 3, 2].
- What is the state of each replica after they merge (A with B, B with A)?
- What is the final counter value?
- Now replica C (state
[1, 1, 1]) joins the merge. What is the final result?
Exercise 2: G-Counter Limitations
- Explain in one sentence why a G-Counter cannot represent decrements.
- Could you build a G-Counter where decrement is allowed by subtracting from the local slot? What goes wrong?
- What mathematical property of the merge function would break if you allowed arbitrary subtraction?
Exercise 3: OR-Set Concurrent Operations
Three replicas (A, B, C) start with an empty OR-Set.
- Replica A: adds “apple” (tag = a1), adds “banana” (tag = a2)
- Replica B: adds “apple” (tag = b1), removes “banana” (removes a2 — B observed A’s add)
- Replica C: removes “apple” (removes a1, b1 — C observed both)
After all replicas merge:
- Is “apple” in the set? Why or why not?
- Is “banana” in the set? Why or why not?
- How many unique tags are in the final set?
Exercise 4: CRDT vs Consensus
An e-commerce platform uses an inventory CRDT (PN-Counter) to track stock across regions. The current count for item X is 5.
- Two regions both sell one unit simultaneously (decrement by 1 each). What does the counter show after merge?
- The actual physical inventory is 5 units. After the decrements above, you sell 3 more units in region A and 2 more in region B. A customer checks the count and sees 3. They buy 3 — but the warehouse only has 2 left. How could this over-selling happen?
- Would Raft prevent this problem? What trade-off does it introduce?
👁️ View Solutions
Solutions: CRDTs
Exercise 1: G-Counter Merge
-
After A merges with B:
- A receives
[0, 3, 2]from B → elementwise max of[5, 0, 0]and[0, 3, 2]=[5, 3, 2] - B receives
[5, 0, 0]from A → elementwise max of[0, 3, 2]and[5, 0, 0]=[5, 3, 2] - Both converge to
[5, 3, 2]
- A receives
-
Final counter value: 5 + 3 + 2 = 10
-
C joins: max of
[5, 3, 2]and[1, 1, 1]:- max(5, 1) = 5
- max(3, 1) = 3
- max(2, 1) = 2
- Result:
[5, 3, 2]— C’s values were all lower, so they don’t change the outcome.
Exercise 2: G-Counter Limitations
-
A G-Counter’s merge function is elementwise max — taking the maximum of two values can never produce a smaller number, so you can never reduce any slot.
-
If you subtract from your local slot, the elementwise max during merge would take the larger previous value from another replica. Your decrement would be silently lost. For example: A has
[5, 0, 0], A decrements to[4, 0, 0], then merges with B’s[5, 0, 0](which hadn’t seen the decrement). Result:[5, 0, 0]— the decrement disappeared. -
Elementwise max is not invertible. If you allow subtraction, the merge function would need to decrease a value, which max cannot do. The lattice property (join = least upper bound) only works with monotonic increases.
Exercise 3: OR-Set Concurrent Operations
-
Is “apple” in the set? Yes. C removed tags a1 and b1. But A had already added “apple” with a1, and B had added with b1. Both are removed by C’s operation. However… wait, let’s trace carefully:
- A adds “apple” with tag a1. A adds “banana” with tag a2.
- B adds “apple” with tag b1. B removes “banana” — removes tag a2.
- C removes “apple” — removes tags a1 and b1 (that C observed).
After merge: A has {(“apple”, a1), (“banana”, a2)} ∪ {(“apple”, b1)} — but a1 and b1 were removed by C. However, the merge is a union of internal sets. C’s state has { } (it removed everything it knew about). A has {(“apple”, a1), (“banana”, a2), (“apple”, b1)} only if B’s adds propagated. But B also sent its adds.
Actually, OR-Set merge is set union of (element, tag) pairs. C’s remove of tags a1 and b1 means those tags are excluded from C’s internal set. But when A merges with C, A still has a1 and b1 (A received b1 from B). So the union brings them back. Wait — that breaks the semantics.
Correction: In an OR-Set, when you remove, you add the removed tags to a “tombstone” set (or equivalently, the internal set is
{(elem, tag)}and remove means “remove all observed (elem, tag) pairs”). But the merge is union — if another replica still has those tags, they come back.This is why real OR-Sets use “observed-remove” with version vectors or tombstones. The answer depends on implementation. For the standard OR-Set:
After all merges (union of all (e,t) pairs minus removed pairs):
- Banana’s a2 tag is removed by B, and only A had it. Gone.
- Apple’s a1 is removed by C, but B has b1 (different tag) — wait, a1 is explicitly removed, but a1 was from A. B had b1, not a1.
Let me retrace properly: after full sync:
- All tags: a1 (“apple”), a2 (“banana”), b1 (“apple”)
- Removed tags: a2 (by B), a1 and b1 (by C)
- Surviving tags: none — all three tags are in the remove set.
Final set: empty {}.
This is actually a problem with basic OR-Sets — concurrent adds and removes can delete everything. This is why production OR-Sets often use version vectors or “add wins” semantics more carefully.
-
Is “banana” in the set? No — tag a2 was removed by B, and no other replica added “banana”.
a2is in the tombstone set. -
Unique tags surviving: 0 (everything was removed).
Key insight: This exercise shows a weakness of basic OR-Sets — if a remove operation observes all tags and removes them, the element disappears. Production OR-Sets add version vectors to track causality so that concurrent adds survive.
Exercise 4: CRDT vs Consensus
-
After both regions decrement: Both decrements are recorded. Region A’s N-counter has
[1, 0], Region B’s has[0, 1]. After merge: P =[0, 0], N =[1, 1]. Value = 0 - (1 + 1) = -2. Wait — the initial value was 5, so P should track the initial value. Let me reconsider.Initial state (both regions): P =
[5, 5], N =[0, 0](each region’s P-counter has “5” for the initial stock). If each sells 1: region A decrements → N_A =[1, 0], region B → N_B =[0, 1].After merge: P =
[5, 5], N =[1, 1]. Value = (5+5) - (1+1) = 10 - 2 = 8. That’s wrong — we only had 5 units initially. This reveals a problem: each region independently initialized P to 5. The initialization is not coordinated.The real answer: A PN-Counter for inventory must be initialized on one replica and propagated. If both regions independently set P to 5, the merge results in P = max(5,5) = 5 per slot, sum(P) = 10 — double counting. This is why initial state needs coordination.
But ignoring the initialization issue: after both decrements, the counter shows 3 (5 - 1 - 1 = 3).
-
Over-selling: The counter shows 3 (a stale or concurrent read), but in reality, by the time that customer checks, 5 more units have been sold across both regions (3 + 2). The counter eventually converges to show -2, but during the sync delay, the customer saw 3 and bought 3. The system sold 10 units of a 5-unit inventory.
-
Would Raft prevent this? Yes — Raft provides strong consistency. All writes go through a leader, so inventory is always accurate. Trade-off: Writes are slower (require majority ack), and if the leader is in a different region, every write incurs cross-region latency. During a network partition, the minority partition cannot sell anything. CRDTs sacrifice accuracy (stale reads) for availability (local writes always succeed).