Distributed & Decentralized Systems Curriculum
Reflection Real Systems · Comparison

Key Question

Given any application requirement, how do you systematically decide which of these four systems to use?

Deep Dive

The Decision Tree

Do you need complex queries (filtering, aggregation)?
  ├── YES → MongoDB (or PostgreSQL for relational data)
  └── NO  → Can you fit your dataset in memory?
              ├── YES → Redis Cluster
              └── NO  → Is your workload write-heavy?
                          ├── YES → Cassandra
                          └── NO  → MongoDB (or Riak if AP needed)

This tree is a starting point. Real decisions involve more nuance.

Decision Factor 1: Consistency Requirements

RequirementRecommended SystemConfiguration
Strong consistency (linearizability)Single-primary (Redis, MongoDB)Read from primary
Causal consistencyMongoDBafterClusterTime
Eventual consistency (fast reads)Cassandra / RiakCL=ONE / r=1
Tunable (per-query)Cassandra / RiakPer-request CL

Decision Factor 2: Throughput Requirements

Write throughput estimation:

function estimateThroughput(writesPerSec: number, readRatio: number, datasetGB: number) {
  if (writesPerSec > 50000) {
    return 'Cassandra or Riak (leaderless)'
  }
  if (writesPerSec > 10000 && readRatio > 0.9) {
    return 'Redis Cluster (with replication)'
  }
  return 'MongoDB (simplest ops)'
}

Read throughput: All four systems can saturate a 10Gbps NIC with appropriate sizing. The bottleneck is usually CPU (index lookups, sorting) or disk I/O, not the network.

Decision Factor 3: Geo-Distribution

SystemMulti-DC SupportConflict Handling
Redis ClusterActive-Passive (replica in other DC)No conflicts
MongoDBReplica set members across DCsNo conflicts (single-primary)
CassandraNative (each node tagged with DC+rack)LWW
RiakNative (vnodes span DCs)Vector clocks / CRDTs

For active-active geo-distribution, Cassandra’s native multi-DC support is most mature. Riak also supports it natively. Redis and MongoDB require more manual configuration.

Decision Factor 4: Team Expertise

  • Redis Cluster: Requires understanding of hash slots, resharding, and sentinel failover. Low maintenance once stable.
  • MongoDB: Easiest team onboarding (document model, rich driver ecosystem). Atlas removes ops burden.
  • Cassandra: Steepest learning curve (compaction, repair, gossip tuning, tombstone management).
  • Riak: Now niche — hard to hire for. Only choose Riak if you have existing expertise.

The “MongoDB Is Fine” Rule

For 80% of applications, MongoDB is the right choice. It has:

  • The best developer experience
  • Rich query language
  • Strong consistency by default
  • Excellent managed service (Atlas)
  • Large hiring pool

Use Cassandra only when MongoDB’s write ceiling is a bottleneck. Use Redis only when sub-millisecond latency and in-memory speed are critical. Use Riak only when your team already knows it.

Key Takeaways

  • Start with MongoDB for most applications.
  • Switch to Cassandra when writes exceed 10K/sec per partition.
  • Switch to Redis when you need sub-millisecond latency and data fits in memory.
  • Choose Riak only if you have Dynamo-specific requirements (CRDTs, active-active geo).
  • The best system is one your team can operate, not the one with the best theoretical properties.

Full Source

View or download the complete implementation: comparison.ts

Exercises

  1. Apply the decision tree to: a real-time analytics pipeline ingesting 200K events/sec with 10ms latency SLO.
  2. A startup with 3 engineers needs a database for their SaaS product. They expect 100 req/sec and 10GB of data. Recommend a system and defend your choice.
  3. A global social media company needs a timeline service: each user’s timeline must show posts from their network, sorted by time. Which system handles this?

👁️ View Solutions

  1. 200K writes/sec exceeds MongoDB’s single-primary ceiling (typically ~10-50K). Cassandra is the right choice. Use CL=ONE for writes (fast ingestion), and run regular repair. For the 10ms latency SLO, ensure clients connect to the nearest Cassandra node. Consider adding a Redis cache layer in front of Cassandra for pre-aggregated results.
  2. With 100 req/sec and 10GB, MongoDB Atlas is the correct choice. The team of 3 engineers doesn’t have the bandwidth to manage Cassandra’s repair cycles, compaction tuning, or Redis Cluster resharding. MongoDB Atlas provides a managed replica set, automatic backups, and point-in-time recovery. The rich query language allows rapid iteration on the product without schema changes blocking development. “Start with MongoDB” is the right play for most startups.
  3. This is the classic “fan-out on write vs read” trade-off. Redis Cluster works if each timeline fits in memory and updates are frequent. Use Redis sorted sets with the post timestamp as score. Cassandra works if timelines are stored as time-ordered partitions (timeline_by_user table with clustering by time). MongoDB works but needs secondary indexes on time for sorting. The answer depends on the read-to-write ratio: write-heavy → Cassandra; read-heavy → Redis cache + Cassandra persistence.

✏️ Exercises

Module 8: Comparison — Exercises

Exercise 1

Match the scenario to the recommended system:

ScenarioSystem
A. Real-time chat (1M concurrent users, low latency)?
B. Product catalog (50K SKUs, faceted search)?
C. User sessions (50M users, TTL-based expiry)?
D. IoT sensor data (1M writes/sec, time-series)?
E. Shopping cart (high consistency, multi-device)?

Exercise 2

Your team chose Cassandra for a new project. Six months later, reads are getting slower and disk usage is growing faster than expected. What’s likely happening? What do you check?

Exercise 3

Explain the trade-off between Redis Cluster’s “no conflicts” guarantee and its “asynchronous replication” behavior. How can a write be acknowledged and still lost?

Exercise 4

A managed cloud service (MongoDB Atlas, Amazon MemoryDB, Amazon Keyspaces) eliminates operational burden. Does this change the decision tree? When would you still run a system yourself?


👁️ View Solutions

  1. A → Redis Cluster (pub/sub + sorted sets for presence, in-memory speed). B → MongoDB (rich query, faceted search, varying product attributes). C → Redis Cluster (TTL key expiration, sub-ms lookup). D → Cassandra (write throughput, time-series data model with clustering). E → Riak with CRDTs OR Redis with careful locking — Redis has no conflict resolution; Riak’s CRDT merge preserves all operations.

  2. Tombstones. Deletes in Cassandra create tombstones that occupy space and slow down reads until compaction removes them. Check: nodetool cfstats (look for dropped tombstones, SSTable count), nodetool compactionstats (compaction backlog), and nodetool tablestats for read latency. Fix: increase compaction throughput, adjust gc_grace_seconds, and run targeted repair to clear tombstones.

  3. Redis Cluster guarantees no conflicts within a shard because all operations are serialized through one primary. But the primary replicates asynchronously to its replica. If the primary accepts a write, acknowledges it to the client, and fails before the replica receives it, the write is lost. The “no conflict” guarantee is about concurrent access (two clients writing different values), not durability. Fix: use Redis with WAIT to ensure synchronous replication, or use a different system.

  4. Managed services change the decision tree significantly: MongoDB Atlas removes ops burden and makes MongoDB viable for more use cases. Amazon MemoryDB (Redis-compatible with durable replication) solves the async-replication-loss problem. Amazon Keyspaces (Cassandra-compatible) removes repair burden. Run your own only when: (a) Data sovereignty requires on-premise deployment. (b) Your throughput requirements exceed managed service limits. (c) Cost optimization (at sufficient scale, self-hosted is cheaper). (d) You need features the managed service doesn’t support.