Key Question
How do you build a distributed counter that never loses an increment?
Deep Dive
A counter is the simplest CRDT. It illustrates the core ideas without getting bogged down in set theory or vector clock math.
G-Counter (Grow-Only Counter)
A G-Counter can only increase. Each replica owns one slot in a vector. Only the owning replica increments its own slot. To merge, take the elementwise maximum of both vectors.
State: A vector [c1, c2, ..., cn] where ci is replica i’s count.
Operations:
increment(i)— replicaiincrementsc_i += 1value()— sum of all elementsmerge(other)—c_i = max(c_i, other_i)for each i
Worked Example: 3 Replicas
Initial state (all three replicas):
A: [0, 0, 0] value = 0
B: [0, 0, 0] value = 0
C: [0, 0, 0] value = 0
A increments twice, B increments once:
A: [2, 0, 0] value = 2
B: [0, 1, 0] value = 1
C: [0, 0, 0] value = 0
A and B merge:
A receives [0, 1, 0] from B → max([2,0,0], [0,1,0]) = [2, 1, 0]
A: [2, 1, 0] value = 3
B receives [2, 0, 0] from A → max([2,0,0], [0,1,0]) = [2, 1, 0]
B: [2, 1, 0] value = 3
C merges with A:
C receives [2, 1, 0] → max([0,0,0], [2,1,0]) = [2, 1, 0]
C: [2, 1, 0] value = 3
All replicas converge to [2, 1, 0], value = 3 ✓
The Merge Properties
Commutative: merge(A, B) = merge(B, A)
Associative: merge(A, merge(B, C)) = merge(merge(A, B), C)
Idempotent: merge(A, A) = A
Elementwise max satisfies all three. This is why state-based CRDTs converge regardless of merge order.
P-N-Counter (Positive-Negative Counter)
A G-Counter can’t go down. But real systems need decrements (e.g., inventory counts, available seats).
Solution: Two G-Counters — one for increments (P), one for decrements (N).
P-Counter: [p1, p2, ..., pn] (increments)
N-Counter: [n1, n2, ..., nn] (decrements)
Value: sum(P) - sum(N)
Initial:
P = [0, 0, 0]
N = [0, 0, 0]
Value = 0
A increments twice, B decrements once:
P_A = [2, 0, 0] N_A = [0, 0, 0]
P_B = [0, 0, 0] N_B = [0, 1, 0]
After merge:
P = max(P_A, P_B) = [2, 0, 0]
N = max(N_A, N_B) = [0, 1, 0]
Value = (2 + 0 + 0) - (0 + 1 + 0) = 2 - 1 = 1
The P and N vectors each use elementwise max independently. The final value is sum(P) - sum(N), which converges across all replicas.
Limitation
P-N-Counters accurately reflect the net count only if no replica’s decrement exceeds its increment in absolute terms. But that constraint is application-level — the data structure itself is correct.
Check Your Understanding
-
If a G-Counter has state
[5, 0, 0]and merges with[0, 3, 2], what is the resulting state and value? -
Why can’t a single G-Counter handle decrements? What mathematical property breaks?
-
In a P-N-Counter, replicas A, B, and C start at zero. A increments twice. B decrements once. C increments once and decrements once. After all replicas merge, what is the final value?
The “So What?”
G-Counters and P-N-Counters are the building blocks of more complex CRDTs (sets, maps, lists). Every CRDT you encounter uses the same trick — represent state as a lattice where merge is a join operation. Once you understand vector-based counters, you understand 80% of CRDT math. Riak uses P-N-Counters for distributed metrics. Redis Enterprise uses them for CRDT-based replication.
✏️ Exercises
Exercises: CRDTs
Exercise 1: G-Counter Merge
A G-Counter has 3 replicas. Replica A’s state is [5, 0, 0]. Replica B’s state is [0, 3, 2].
- What is the state of each replica after they merge (A with B, B with A)?
- What is the final counter value?
- Now replica C (state
[1, 1, 1]) joins the merge. What is the final result?
Exercise 2: G-Counter Limitations
- Explain in one sentence why a G-Counter cannot represent decrements.
- Could you build a G-Counter where decrement is allowed by subtracting from the local slot? What goes wrong?
- What mathematical property of the merge function would break if you allowed arbitrary subtraction?
Exercise 3: OR-Set Concurrent Operations
Three replicas (A, B, C) start with an empty OR-Set.
- Replica A: adds “apple” (tag = a1), adds “banana” (tag = a2)
- Replica B: adds “apple” (tag = b1), removes “banana” (removes a2 — B observed A’s add)
- Replica C: removes “apple” (removes a1, b1 — C observed both)
After all replicas merge:
- Is “apple” in the set? Why or why not?
- Is “banana” in the set? Why or why not?
- How many unique tags are in the final set?
Exercise 4: CRDT vs Consensus
An e-commerce platform uses an inventory CRDT (PN-Counter) to track stock across regions. The current count for item X is 5.
- Two regions both sell one unit simultaneously (decrement by 1 each). What does the counter show after merge?
- The actual physical inventory is 5 units. After the decrements above, you sell 3 more units in region A and 2 more in region B. A customer checks the count and sees 3. They buy 3 — but the warehouse only has 2 left. How could this over-selling happen?
- Would Raft prevent this problem? What trade-off does it introduce?
👁️ View Solutions
Solutions: CRDTs
Exercise 1: G-Counter Merge
-
After A merges with B:
- A receives
[0, 3, 2]from B → elementwise max of[5, 0, 0]and[0, 3, 2]=[5, 3, 2] - B receives
[5, 0, 0]from A → elementwise max of[0, 3, 2]and[5, 0, 0]=[5, 3, 2] - Both converge to
[5, 3, 2]
- A receives
-
Final counter value: 5 + 3 + 2 = 10
-
C joins: max of
[5, 3, 2]and[1, 1, 1]:- max(5, 1) = 5
- max(3, 1) = 3
- max(2, 1) = 2
- Result:
[5, 3, 2]— C’s values were all lower, so they don’t change the outcome.
Exercise 2: G-Counter Limitations
-
A G-Counter’s merge function is elementwise max — taking the maximum of two values can never produce a smaller number, so you can never reduce any slot.
-
If you subtract from your local slot, the elementwise max during merge would take the larger previous value from another replica. Your decrement would be silently lost. For example: A has
[5, 0, 0], A decrements to[4, 0, 0], then merges with B’s[5, 0, 0](which hadn’t seen the decrement). Result:[5, 0, 0]— the decrement disappeared. -
Elementwise max is not invertible. If you allow subtraction, the merge function would need to decrease a value, which max cannot do. The lattice property (join = least upper bound) only works with monotonic increases.
Exercise 3: OR-Set Concurrent Operations
-
Is “apple” in the set? Yes. C removed tags a1 and b1. But A had already added “apple” with a1, and B had added with b1. Both are removed by C’s operation. However… wait, let’s trace carefully:
- A adds “apple” with tag a1. A adds “banana” with tag a2.
- B adds “apple” with tag b1. B removes “banana” — removes tag a2.
- C removes “apple” — removes tags a1 and b1 (that C observed).
After merge: A has {(“apple”, a1), (“banana”, a2)} ∪ {(“apple”, b1)} — but a1 and b1 were removed by C. However, the merge is a union of internal sets. C’s state has { } (it removed everything it knew about). A has {(“apple”, a1), (“banana”, a2), (“apple”, b1)} only if B’s adds propagated. But B also sent its adds.
Actually, OR-Set merge is set union of (element, tag) pairs. C’s remove of tags a1 and b1 means those tags are excluded from C’s internal set. But when A merges with C, A still has a1 and b1 (A received b1 from B). So the union brings them back. Wait — that breaks the semantics.
Correction: In an OR-Set, when you remove, you add the removed tags to a “tombstone” set (or equivalently, the internal set is
{(elem, tag)}and remove means “remove all observed (elem, tag) pairs”). But the merge is union — if another replica still has those tags, they come back.This is why real OR-Sets use “observed-remove” with version vectors or tombstones. The answer depends on implementation. For the standard OR-Set:
After all merges (union of all (e,t) pairs minus removed pairs):
- Banana’s a2 tag is removed by B, and only A had it. Gone.
- Apple’s a1 is removed by C, but B has b1 (different tag) — wait, a1 is explicitly removed, but a1 was from A. B had b1, not a1.
Let me retrace properly: after full sync:
- All tags: a1 (“apple”), a2 (“banana”), b1 (“apple”)
- Removed tags: a2 (by B), a1 and b1 (by C)
- Surviving tags: none — all three tags are in the remove set.
Final set: empty {}.
This is actually a problem with basic OR-Sets — concurrent adds and removes can delete everything. This is why production OR-Sets often use version vectors or “add wins” semantics more carefully.
-
Is “banana” in the set? No — tag a2 was removed by B, and no other replica added “banana”.
a2is in the tombstone set. -
Unique tags surviving: 0 (everything was removed).
Key insight: This exercise shows a weakness of basic OR-Sets — if a remove operation observes all tags and removes them, the element disappears. Production OR-Sets add version vectors to track causality so that concurrent adds survive.
Exercise 4: CRDT vs Consensus
-
After both regions decrement: Both decrements are recorded. Region A’s N-counter has
[1, 0], Region B’s has[0, 1]. After merge: P =[0, 0], N =[1, 1]. Value = 0 - (1 + 1) = -2. Wait — the initial value was 5, so P should track the initial value. Let me reconsider.Initial state (both regions): P =
[5, 5], N =[0, 0](each region’s P-counter has “5” for the initial stock). If each sells 1: region A decrements → N_A =[1, 0], region B → N_B =[0, 1].After merge: P =
[5, 5], N =[1, 1]. Value = (5+5) - (1+1) = 10 - 2 = 8. That’s wrong — we only had 5 units initially. This reveals a problem: each region independently initialized P to 5. The initialization is not coordinated.The real answer: A PN-Counter for inventory must be initialized on one replica and propagated. If both regions independently set P to 5, the merge results in P = max(5,5) = 5 per slot, sum(P) = 10 — double counting. This is why initial state needs coordination.
But ignoring the initialization issue: after both decrements, the counter shows 3 (5 - 1 - 1 = 3).
-
Over-selling: The counter shows 3 (a stale or concurrent read), but in reality, by the time that customer checks, 5 more units have been sold across both regions (3 + 2). The counter eventually converges to show -2, but during the sync delay, the customer saw 3 and bought 3. The system sold 10 units of a 5-unit inventory.
-
Would Raft prevent this? Yes — Raft provides strong consistency. All writes go through a leader, so inventory is always accurate. Trade-off: Writes are slower (require majority ack), and if the leader is in a different region, every write incurs cross-region latency. During a network partition, the minority partition cannot sell anything. CRDTs sacrifice accuracy (stale reads) for availability (local writes always succeed).