Distributed & Decentralized Systems Curriculum
Consistency Trade offs · CRDTs

Key Question

How do you build a CRDT for sets and registers used in collaborative editing?

Deep Dive

Counters are simple. Real applications need richer data types — registers (a single value), sets, maps, and lists. Two of the most practical are the LWW-Register and the OR-Set.

LWW-Register (Last Writer Wins Register)

A register stores a single value. The naive approach: timestamp + value, latest timestamp wins.

State: (timestamp, value)

Replica A writes "hello" at t=10:
  A: (10, "hello")

Replica B writes "world" at t=20:
  B: (20, "world")

Merge: pick highest timestamp → (20, "world") ✓

Problem — concurrent writes are lost:

Replica A writes "hello" at t=15
Replica B writes "world" at t=15 (same clock)

Merge: tie-break by replica ID → "world" wins
"hello" is lost forever.

LWW-Registers always converge, but they drop data. Acceptable when the latest write is authoritative (last edit in a document). Bad when every write matters (likes, votes).

OR-Set (Observed-Remove Set)

A set where elements can be added and removed without losing concurrent operations. The key insight: add beats remove.

How it works:

Each element in the set has a unique tag (typically a UUID generated by the adding replica). The set internally stores {(element, tag)} pairs.

  • add(element) — generates a new unique tag, adds (element, tag) to the set
  • remove(element) — removes all tags for that element that this replica has observed
  • lookup(element) — true if any (element, tag) exists
  • merge(other) — union of both internal tag sets
Initial state: both replicas have empty set {}

Alice adds "celery":
  A: {("celery", tag_A1)}

Alice adds "carrot":
  A: {("celery", tag_A1), ("carrot", tag_A2)}

Bob adds "carrot" (concurrent with Alice):
  B: {("carrot", tag_B1)}

Bob removes "celery" (removes tag_A1):
  B: {}  (Bob never had "celery", but observes A1's removal)

Now network connects and A and B merge:

Merge A ∪ B:
  A: {("celery", tag_A1), ("carrot", tag_A2), ("carrot", tag_B1)}
  B: {("carrot", tag_B1), ("carrot", tag_A2)}

Both replicas converge to the same state ✓

What happened? Alice’s “celery” added tag_A1. Bob’s remove removed tag_A1 (the only tag Bob had observed). But Alice concurrently added “carrot” with tag_A2, which Bob never observed. So tag_A2 survives — “carrot” stays in the set. Bob’s own “carrot” (tag_B1) also survives.

What about concurrent add and remove of the same element?

Alice adds "onion" with tag_A3.
Bob (who hasn't seen tag_A3) removes "onion".
They merge:
  Alice has {("onion", tag_A3)}
  Bob has removed all tags he knows about for "onion" (none, since he hadn't seen tag_A3)
  
After merge: "onion" is present (tag_A3 survives)

The remove only removes tags the remover has observed. If a concurrent add generated a tag the remover never saw, that tag stays. Add wins over concurrent remove.

Comparison

PropertyLWW-RegisterOR-Set
ConvergesYesYes
Loses concurrent writesYesNo
SizeO(1)O(n) per element
Works forLast-write-wins valuesCollaborative lists, shopping carts

Check Your Understanding

  1. In an OR-Set, two replicas both add “apple” at the same time, then merge. How many tags for “apple” exist after the merge?

  2. Why does the OR-Set need unique tags per add operation? What would break if two adds of the same element reused the same tag?

  3. An LWW-Register stores (t=100, value="X") on replica A and (t=100, value="Y") on replica B. They merge. Which value survives? Who decides?

The “So What?”

OR-Sets are the foundation of collaborative applications. Google Docs’ internal data model is essentially a list of characters with OR-Set semantics — each character has a unique ID, concurrent insertions both survive. Shopping cart CRDTs in Redis Enterprise use OR-Sets so that concurrent adds of the same item don’t conflict. If you’re building any application with offline edits, OR-Sets are your starting point.


✏️ Exercises

Exercises: CRDTs

Exercise 1: G-Counter Merge

A G-Counter has 3 replicas. Replica A’s state is [5, 0, 0]. Replica B’s state is [0, 3, 2].

  1. What is the state of each replica after they merge (A with B, B with A)?
  2. What is the final counter value?
  3. Now replica C (state [1, 1, 1]) joins the merge. What is the final result?

Exercise 2: G-Counter Limitations

  1. Explain in one sentence why a G-Counter cannot represent decrements.
  2. Could you build a G-Counter where decrement is allowed by subtracting from the local slot? What goes wrong?
  3. What mathematical property of the merge function would break if you allowed arbitrary subtraction?

Exercise 3: OR-Set Concurrent Operations

Three replicas (A, B, C) start with an empty OR-Set.

  • Replica A: adds “apple” (tag = a1), adds “banana” (tag = a2)
  • Replica B: adds “apple” (tag = b1), removes “banana” (removes a2 — B observed A’s add)
  • Replica C: removes “apple” (removes a1, b1 — C observed both)

After all replicas merge:

  1. Is “apple” in the set? Why or why not?
  2. Is “banana” in the set? Why or why not?
  3. How many unique tags are in the final set?

Exercise 4: CRDT vs Consensus

An e-commerce platform uses an inventory CRDT (PN-Counter) to track stock across regions. The current count for item X is 5.

  1. Two regions both sell one unit simultaneously (decrement by 1 each). What does the counter show after merge?
  2. The actual physical inventory is 5 units. After the decrements above, you sell 3 more units in region A and 2 more in region B. A customer checks the count and sees 3. They buy 3 — but the warehouse only has 2 left. How could this over-selling happen?
  3. Would Raft prevent this problem? What trade-off does it introduce?
👁️ View Solutions

Solutions: CRDTs

Exercise 1: G-Counter Merge

  1. After A merges with B:

    • A receives [0, 3, 2] from B → elementwise max of [5, 0, 0] and [0, 3, 2] = [5, 3, 2]
    • B receives [5, 0, 0] from A → elementwise max of [0, 3, 2] and [5, 0, 0] = [5, 3, 2]
    • Both converge to [5, 3, 2]
  2. Final counter value: 5 + 3 + 2 = 10

  3. C joins: max of [5, 3, 2] and [1, 1, 1]:

    • max(5, 1) = 5
    • max(3, 1) = 3
    • max(2, 1) = 2
    • Result: [5, 3, 2]C’s values were all lower, so they don’t change the outcome.

Exercise 2: G-Counter Limitations

  1. A G-Counter’s merge function is elementwise max — taking the maximum of two values can never produce a smaller number, so you can never reduce any slot.

  2. If you subtract from your local slot, the elementwise max during merge would take the larger previous value from another replica. Your decrement would be silently lost. For example: A has [5, 0, 0], A decrements to [4, 0, 0], then merges with B’s [5, 0, 0] (which hadn’t seen the decrement). Result: [5, 0, 0] — the decrement disappeared.

  3. Elementwise max is not invertible. If you allow subtraction, the merge function would need to decrease a value, which max cannot do. The lattice property (join = least upper bound) only works with monotonic increases.


Exercise 3: OR-Set Concurrent Operations

  1. Is “apple” in the set? Yes. C removed tags a1 and b1. But A had already added “apple” with a1, and B had added with b1. Both are removed by C’s operation. However… wait, let’s trace carefully:

    • A adds “apple” with tag a1. A adds “banana” with tag a2.
    • B adds “apple” with tag b1. B removes “banana” — removes tag a2.
    • C removes “apple” — removes tags a1 and b1 (that C observed).

    After merge: A has {(“apple”, a1), (“banana”, a2)} ∪ {(“apple”, b1)} — but a1 and b1 were removed by C. However, the merge is a union of internal sets. C’s state has { } (it removed everything it knew about). A has {(“apple”, a1), (“banana”, a2), (“apple”, b1)} only if B’s adds propagated. But B also sent its adds.

    Actually, OR-Set merge is set union of (element, tag) pairs. C’s remove of tags a1 and b1 means those tags are excluded from C’s internal set. But when A merges with C, A still has a1 and b1 (A received b1 from B). So the union brings them back. Wait — that breaks the semantics.

    Correction: In an OR-Set, when you remove, you add the removed tags to a “tombstone” set (or equivalently, the internal set is {(elem, tag)} and remove means “remove all observed (elem, tag) pairs”). But the merge is union — if another replica still has those tags, they come back.

    This is why real OR-Sets use “observed-remove” with version vectors or tombstones. The answer depends on implementation. For the standard OR-Set:

    After all merges (union of all (e,t) pairs minus removed pairs):

    • Banana’s a2 tag is removed by B, and only A had it. Gone.
    • Apple’s a1 is removed by C, but B has b1 (different tag) — wait, a1 is explicitly removed, but a1 was from A. B had b1, not a1.

    Let me retrace properly: after full sync:

    • All tags: a1 (“apple”), a2 (“banana”), b1 (“apple”)
    • Removed tags: a2 (by B), a1 and b1 (by C)
    • Surviving tags: none — all three tags are in the remove set.

    Final set: empty {}.

    This is actually a problem with basic OR-Sets — concurrent adds and removes can delete everything. This is why production OR-Sets often use version vectors or “add wins” semantics more carefully.

  2. Is “banana” in the set? No — tag a2 was removed by B, and no other replica added “banana”. a2 is in the tombstone set.

  3. Unique tags surviving: 0 (everything was removed).

    Key insight: This exercise shows a weakness of basic OR-Sets — if a remove operation observes all tags and removes them, the element disappears. Production OR-Sets add version vectors to track causality so that concurrent adds survive.


Exercise 4: CRDT vs Consensus

  1. After both regions decrement: Both decrements are recorded. Region A’s N-counter has [1, 0], Region B’s has [0, 1]. After merge: P = [0, 0], N = [1, 1]. Value = 0 - (1 + 1) = -2. Wait — the initial value was 5, so P should track the initial value. Let me reconsider.

    Initial state (both regions): P = [5, 5], N = [0, 0] (each region’s P-counter has “5” for the initial stock). If each sells 1: region A decrements → N_A = [1, 0], region B → N_B = [0, 1].

    After merge: P = [5, 5], N = [1, 1]. Value = (5+5) - (1+1) = 10 - 2 = 8. That’s wrong — we only had 5 units initially. This reveals a problem: each region independently initialized P to 5. The initialization is not coordinated.

    The real answer: A PN-Counter for inventory must be initialized on one replica and propagated. If both regions independently set P to 5, the merge results in P = max(5,5) = 5 per slot, sum(P) = 10 — double counting. This is why initial state needs coordination.

    But ignoring the initialization issue: after both decrements, the counter shows 3 (5 - 1 - 1 = 3).

  2. Over-selling: The counter shows 3 (a stale or concurrent read), but in reality, by the time that customer checks, 5 more units have been sold across both regions (3 + 2). The counter eventually converges to show -2, but during the sync delay, the customer saw 3 and bought 3. The system sold 10 units of a 5-unit inventory.

  3. Would Raft prevent this? Yes — Raft provides strong consistency. All writes go through a leader, so inventory is always accurate. Trade-off: Writes are slower (require majority ack), and if the leader is in a different region, every write incurs cross-region latency. During a network partition, the minority partition cannot sell anything. CRDTs sacrifice accuracy (stale reads) for availability (local writes always succeed).