Consistency Trade offs · CRDTs

Key Question

How do you build a distributed counter that never loses an increment?

Deep Dive

A counter is the simplest CRDT. It illustrates the core ideas without getting bogged down in set theory or vector clock math.

G-Counter (Grow-Only Counter)

A G-Counter can only increase. Each replica owns one slot in a vector. Only the owning replica increments its own slot. To merge, take the elementwise maximum of both vectors.

State: A vector [c1, c2, ..., cn] where ci is replica i’s count.

Operations:

increment(i) — replica i increments c_i += 1
value() — sum of all elements
merge(other) — c_i = max(c_i, other_i) for each i

Worked Example: 3 Replicas

Initial state (all three replicas):
  A: [0, 0, 0]   value = 0
  B: [0, 0, 0]   value = 0
  C: [0, 0, 0]   value = 0

A increments twice, B increments once:
  A: [2, 0, 0]   value = 2
  B: [0, 1, 0]   value = 1
  C: [0, 0, 0]   value = 0

A and B merge:
  A receives [0, 1, 0] from B → max([2,0,0], [0,1,0]) = [2, 1, 0]
  A: [2, 1, 0]   value = 3

  B receives [2, 0, 0] from A → max([2,0,0], [0,1,0]) = [2, 1, 0]
  B: [2, 1, 0]   value = 3

C merges with A:
  C receives [2, 1, 0] → max([0,0,0], [2,1,0]) = [2, 1, 0]
  C: [2, 1, 0]   value = 3

All replicas converge to [2, 1, 0], value = 3 ✓

The Merge Properties

Commutative:  merge(A, B) = merge(B, A)
Associative: merge(A, merge(B, C)) = merge(merge(A, B), C)
Idempotent:  merge(A, A) = A

Elementwise max satisfies all three. This is why state-based CRDTs converge regardless of merge order.

P-N-Counter (Positive-Negative Counter)

A G-Counter can’t go down. But real systems need decrements (e.g., inventory counts, available seats).

Solution: Two G-Counters — one for increments (P), one for decrements (N).

P-Counter:  [p1, p2, ..., pn]     (increments)
N-Counter:  [n1, n2, ..., nn]     (decrements)
Value:       sum(P) - sum(N)

Initial:
  P = [0, 0, 0]
  N = [0, 0, 0]
  Value = 0

A increments twice, B decrements once:
  P_A = [2, 0, 0]   N_A = [0, 0, 0]
  P_B = [0, 0, 0]   N_B = [0, 1, 0]

After merge:
  P = max(P_A, P_B) = [2, 0, 0]
  N = max(N_A, N_B) = [0, 1, 0]
  Value = (2 + 0 + 0) - (0 + 1 + 0) = 2 - 1 = 1

The P and N vectors each use elementwise max independently. The final value is sum(P) - sum(N), which converges across all replicas.

Limitation

P-N-Counters accurately reflect the net count only if no replica’s decrement exceeds its increment in absolute terms. But that constraint is application-level — the data structure itself is correct.

Check Your Understanding

If a G-Counter has state [5, 0, 0] and merges with [0, 3, 2], what is the resulting state and value?
Why can’t a single G-Counter handle decrements? What mathematical property breaks?
In a P-N-Counter, replicas A, B, and C start at zero. A increments twice. B decrements once. C increments once and decrements once. After all replicas merge, what is the final value?

The “So What?”

G-Counters and P-N-Counters are the building blocks of more complex CRDTs (sets, maps, lists). Every CRDT you encounter uses the same trick — represent state as a lattice where merge is a join operation. Once you understand vector-based counters, you understand 80% of CRDT math. Riak uses P-N-Counters for distributed metrics. Redis Enterprise uses them for CRDT-based replication.

✏️ Exercises

Exercises: CRDTs

Exercise 1: G-Counter Merge

A G-Counter has 3 replicas. Replica A’s state is [5, 0, 0]. Replica B’s state is [0, 3, 2].

What is the state of each replica after they merge (A with B, B with A)?
What is the final counter value?
Now replica C (state [1, 1, 1]) joins the merge. What is the final result?

Exercise 2: G-Counter Limitations

Explain in one sentence why a G-Counter cannot represent decrements.
Could you build a G-Counter where decrement is allowed by subtracting from the local slot? What goes wrong?
What mathematical property of the merge function would break if you allowed arbitrary subtraction?

Exercise 3: OR-Set Concurrent Operations

Three replicas (A, B, C) start with an empty OR-Set.

Replica A: adds “apple” (tag = a1), adds “banana” (tag = a2)
Replica B: adds “apple” (tag = b1), removes “banana” (removes a2 — B observed A’s add)
Replica C: removes “apple” (removes a1, b1 — C observed both)

After all replicas merge:

Is “apple” in the set? Why or why not?
Is “banana” in the set? Why or why not?
How many unique tags are in the final set?

Exercise 4: CRDT vs Consensus

An e-commerce platform uses an inventory CRDT (PN-Counter) to track stock across regions. The current count for item X is 5.

Two regions both sell one unit simultaneously (decrement by 1 each). What does the counter show after merge?
The actual physical inventory is 5 units. After the decrements above, you sell 3 more units in region A and 2 more in region B. A customer checks the count and sees 3. They buy 3 — but the warehouse only has 2 left. How could this over-selling happen?
Would Raft prevent this problem? What trade-off does it introduce?

👁️ View Solutions

Solutions: CRDTs

Exercise 1: G-Counter Merge

After A merges with B:
- A receives [0, 3, 2] from B → elementwise max of [5, 0, 0] and [0, 3, 2] = [5, 3, 2]
- B receives [5, 0, 0] from A → elementwise max of [0, 3, 2] and [5, 0, 0] = [5, 3, 2]
- Both converge to [5, 3, 2]
Final counter value: 5 + 3 + 2 = 10
C joins: max of [5, 3, 2] and [1, 1, 1]:
- max(5, 1) = 5
- max(3, 1) = 3
- max(2, 1) = 2
- Result: [5, 3, 2] — C’s values were all lower, so they don’t change the outcome.

Exercise 2: G-Counter Limitations

A G-Counter’s merge function is elementwise max — taking the maximum of two values can never produce a smaller number, so you can never reduce any slot.
If you subtract from your local slot, the elementwise max during merge would take the larger previous value from another replica. Your decrement would be silently lost. For example: A has [5, 0, 0], A decrements to [4, 0, 0], then merges with B’s [5, 0, 0] (which hadn’t seen the decrement). Result: [5, 0, 0] — the decrement disappeared.
Elementwise max is not invertible. If you allow subtraction, the merge function would need to decrease a value, which max cannot do. The lattice property (join = least upper bound) only works with monotonic increases.

Exercise 3: OR-Set Concurrent Operations

Is “apple” in the set? Yes. C removed tags a1 and b1. But A had already added “apple” with a1, and B had added with b1. Both are removed by C’s operation. However… wait, let’s trace carefully:
- A adds “apple” with tag a1. A adds “banana” with tag a2.
- B adds “apple” with tag b1. B removes “banana” — removes tag a2.
- C removes “apple” — removes tags a1 and b1 (that C observed).
After merge: A has {(“apple”, a1), (“banana”, a2)} ∪ {(“apple”, b1)} — but a1 and b1 were removed by C. However, the merge is a union of internal sets. C’s state has { } (it removed everything it knew about). A has {(“apple”, a1), (“banana”, a2), (“apple”, b1)} only if B’s adds propagated. But B also sent its adds.

Actually, OR-Set merge is set union of (element, tag) pairs. C’s remove of tags a1 and b1 means those tags are excluded from C’s internal set. But when A merges with C, A still has a1 and b1 (A received b1 from B). So the union brings them back. Wait — that breaks the semantics.

Correction: In an OR-Set, when you remove, you add the removed tags to a “tombstone” set (or equivalently, the internal set is {(elem, tag)} and remove means “remove all observed (elem, tag) pairs”). But the merge is union — if another replica still has those tags, they come back.

This is why real OR-Sets use “observed-remove” with version vectors or tombstones. The answer depends on implementation. For the standard OR-Set:

After all merges (union of all (e,t) pairs minus removed pairs):
- Banana’s a2 tag is removed by B, and only A had it. Gone.
- Apple’s a1 is removed by C, but B has b1 (different tag) — wait, a1 is explicitly removed, but a1 was from A. B had b1, not a1.
Let me retrace properly: after full sync:
- All tags: a1 (“apple”), a2 (“banana”), b1 (“apple”)
- Removed tags: a2 (by B), a1 and b1 (by C)
- Surviving tags: none — all three tags are in the remove set.
Final set: empty {}.

This is actually a problem with basic OR-Sets — concurrent adds and removes can delete everything. This is why production OR-Sets often use version vectors or “add wins” semantics more carefully.
Is “banana” in the set? No — tag a2 was removed by B, and no other replica added “banana”. a2 is in the tombstone set.
Unique tags surviving: 0 (everything was removed).

Key insight: This exercise shows a weakness of basic OR-Sets — if a remove operation observes all tags and removes them, the element disappears. Production OR-Sets add version vectors to track causality so that concurrent adds survive.

Exercise 4: CRDT vs Consensus

After both regions decrement: Both decrements are recorded. Region A’s N-counter has [1, 0], Region B’s has [0, 1]. After merge: P = [0, 0], N = [1, 1]. Value = 0 - (1 + 1) = -2. Wait — the initial value was 5, so P should track the initial value. Let me reconsider.

Initial state (both regions): P = [5, 5], N = [0, 0] (each region’s P-counter has “5” for the initial stock). If each sells 1: region A decrements → N_A = [1, 0], region B → N_B = [0, 1].

After merge: P = [5, 5], N = [1, 1]. Value = (5+5) - (1+1) = 10 - 2 = 8. That’s wrong — we only had 5 units initially. This reveals a problem: each region independently initialized P to 5. The initialization is not coordinated.

The real answer: A PN-Counter for inventory must be initialized on one replica and propagated. If both regions independently set P to 5, the merge results in P = max(5,5) = 5 per slot, sum(P) = 10 — double counting. This is why initial state needs coordination.

But ignoring the initialization issue: after both decrements, the counter shows 3 (5 - 1 - 1 = 3).
Over-selling: The counter shows 3 (a stale or concurrent read), but in reality, by the time that customer checks, 5 more units have been sold across both regions (3 + 2). The counter eventually converges to show -2, but during the sync delay, the customer saw 3 and bought 3. The system sold 10 units of a 5-unit inventory.
Would Raft prevent this? Yes — Raft provides strong consistency. All writes go through a leader, so inventory is always accurate. Trade-off: Writes are slower (require majority ack), and if the leader is in a different region, every write incurs cross-region latency. During a network partition, the minority partition cannot sell anything. CRDTs sacrifice accuracy (stale reads) for availability (local writes always succeed).

🎮 Interactive CRDT

Scroll to load interactive visualization…