Distributed & Decentralized Systems Curriculum
Reflection Real Systems · Comparison

Key Question

What happens when you run the same shopping cart workload across Redis, Cassandra, MongoDB, and Riak? Where do they diverge?

Deep Dive

The comparison.ts file runs an identical sequence of operations across all four systems:

1. ADD socks x2     from us-east
2. ADD shirt  x1     from us-east
3. ADD hat    x1     from eu-west
4. GET cart
5. ADD socks x1     from eu-west
6. GET cart

System 1: Redis Cluster

Redis Cluster is a single-primary-per-slot system. Every write for key cart:u1 goes to the same primary node. There is one authoritative copy of the cart at all times:

add(user: string, item: CartItem) {
  // No conflict possible — all writes go to one node
  const existing = this.data.get(key) || []
  const idx = existing.findIndex(i => i.id === item.id)
  if (idx >= 0) existing[idx].qty += item.qty
  else existing.push({ ...item })
}

Result: socks x3, shirt x1, hat x1 — exactly correct. Redis guarantees no lost updates because all writes are serialized through a single node.

System 2: Cassandra

Cassandra is a leaderless system. The cart operations for step 3 (from eu-west) and step 1 (from us-east) can theoretically hit different coordinators. With LWW, the latest timestamp wins:

add(user: string, item: CartItem, from: string) {
  const ts = Number(++this.clock)
  const existing = row.get(item.id)
  if (!existing || ts > existing.ts) {
    row.set(item.id, { qty: item.qty, ts })
  }
}

Potential issue: If step 3’s timestamp happens to be lower than step 1’s, the hat write is ignored. In our simulation, the clock is monotonic so this doesn’t happen. In real Cassandra with clock skew between region coordinators, it can.

System 3: MongoDB

MongoDB behaves identically to Redis Cluster for this workload — single-primary per replica set, no conflicts:

// Same implementation as Redis Cluster
// All writes go through one primary node

MongoDB and Redis produce the same correct result here. The difference emerges during failover: MongoDB can roll back un-replicated writes.

System 4: Riak

Riak exposes the concurrency. When us-east and eu-west both update the cart without seeing each other’s writes, vector clock comparison detects the conflict:

add(user: string, item: CartItem, from: string) {
  const view = this.views.get(writerKey)!
  // Each writer has its own clock
  view.clock.set(from, (view.clock.get(from) || 0) + 1)
}

private areClocksConcurrent(a, b): boolean {
  // True if both clocks have entries the other doesn't
  // → CONFLICT!
}

Riak returns siblings. The application must merge them (or use CRDTs for automatic merge).

Key Takeaways

OperationRedisMongoDBCassandraRiak
Concurrent addCorrectCorrectMay loseReturns siblings
Read after writeImmediateImmediateDepends on CLDepends on R
Failover safetyLossy (async)RollbackNone (LWW)Siblings

Full Source

View or download the complete implementation: comparison.ts

Exercises

  1. Run the comparison. What does each system output for the final cart state?
  2. Modify the workload to introduce concurrent writes from us-east and eu-west at exactly the same logical time. Which systems handle this correctly?
  3. Which system would you choose for a shopping cart? Defend your answer.

👁️ View Solutions

  1. Running the simulation shows: Redis/MongoDB: socks x3, shirt x1, hat x1 (correct). Cassandra: socks x3, shirt x1, hat x1 (correct in this simulation because timestamps are monotonic — but could lose data with clock skew). Riak: socks x3, shirt x1, hat x1 (same, but with a conflict flag if concurrent writes from different regions touched the same item).
  2. If us-east and eu-west both add the same item concurrently: Redis/MongoDB (single-primary) never see the conflict because all operations are ordered. Cassandra with LWW picks one timestamp (loses the other). Riak returns siblings (both operations preserved). The answer: Riak is the only system that detects and surfaces the conflict. But for a simple quantity increment, Riak’s CRDT counter would merge them automatically.
  3. For a shopping cart, the correct answer depends on: Do users use the cart from multiple devices concurrently? If no (single-device carts), Redis Cluster gives the simplest correct behavior. If yes (mobile + desktop concurrently), Riak with CRDTs is most correct — it preserves both additions without LWW data loss. Cassandra is a compromise: simpler than Riak, but LWW may lose items.

✏️ Exercises

Module 8: Comparison — Exercises

Exercise 1

Match the scenario to the recommended system:

ScenarioSystem
A. Real-time chat (1M concurrent users, low latency)?
B. Product catalog (50K SKUs, faceted search)?
C. User sessions (50M users, TTL-based expiry)?
D. IoT sensor data (1M writes/sec, time-series)?
E. Shopping cart (high consistency, multi-device)?

Exercise 2

Your team chose Cassandra for a new project. Six months later, reads are getting slower and disk usage is growing faster than expected. What’s likely happening? What do you check?

Exercise 3

Explain the trade-off between Redis Cluster’s “no conflicts” guarantee and its “asynchronous replication” behavior. How can a write be acknowledged and still lost?

Exercise 4

A managed cloud service (MongoDB Atlas, Amazon MemoryDB, Amazon Keyspaces) eliminates operational burden. Does this change the decision tree? When would you still run a system yourself?


👁️ View Solutions

  1. A → Redis Cluster (pub/sub + sorted sets for presence, in-memory speed). B → MongoDB (rich query, faceted search, varying product attributes). C → Redis Cluster (TTL key expiration, sub-ms lookup). D → Cassandra (write throughput, time-series data model with clustering). E → Riak with CRDTs OR Redis with careful locking — Redis has no conflict resolution; Riak’s CRDT merge preserves all operations.

  2. Tombstones. Deletes in Cassandra create tombstones that occupy space and slow down reads until compaction removes them. Check: nodetool cfstats (look for dropped tombstones, SSTable count), nodetool compactionstats (compaction backlog), and nodetool tablestats for read latency. Fix: increase compaction throughput, adjust gc_grace_seconds, and run targeted repair to clear tombstones.

  3. Redis Cluster guarantees no conflicts within a shard because all operations are serialized through one primary. But the primary replicates asynchronously to its replica. If the primary accepts a write, acknowledges it to the client, and fails before the replica receives it, the write is lost. The “no conflict” guarantee is about concurrent access (two clients writing different values), not durability. Fix: use Redis with WAIT to ensure synchronous replication, or use a different system.

  4. Managed services change the decision tree significantly: MongoDB Atlas removes ops burden and makes MongoDB viable for more use cases. Amazon MemoryDB (Redis-compatible with durable replication) solves the async-replication-loss problem. Amazon Keyspaces (Cassandra-compatible) removes repair burden. Run your own only when: (a) Data sovereignty requires on-premise deployment. (b) Your throughput requirements exceed managed service limits. (c) Cost optimization (at sufficient scale, self-hosted is cheaper). (d) You need features the managed service doesn’t support.