Key Question
How do you choose between Redis Cluster, Cassandra, MongoDB, and Riak — four systems that all claim to be “distributed databases” but work fundamentally differently?
Deep Dive
The Decision Space
Every distributed database is a set of trade-offs. The right choice depends on your workload pattern, consistency requirements, and operational constraints.
Dimension 1: Write Model
| System | Write Model | Conflict Resolution | Write Scalability |
|---|---|---|---|
| Redis Cluster | Single-primary per hash slot | No conflicts (serial) | Linear (add shards) |
| MongoDB | Single-primary per replica set | No conflicts (serial) | Bottlenecked by primary |
| Cassandra | Leaderless (any node) | LWW (timestamp) | Linear (add nodes) |
| Riak | Leaderless (any vnode) | Vector clock / LWW / CRDT | Linear (add nodes) |
Decision rule: If you need >10K writes/sec per data partition, prefer a leaderless system (Cassandra, Riak). If write integrity (no lost data) is paramount, prefer single-primary (Redis, MongoDB) or CRDT-backed Riak.
Dimension 2: Consistency Model
| System | Default Consistency | Tunable? | Reads During Partition |
|---|---|---|---|
| Redis Cluster | Strong (single-primary) | No | Only on accessible shards |
| MongoDB | Strong (primary reads) | Yes (read concern) | Only on accessible replicas |
| Cassandra | Tunable (CL) | Yes (per-query) | Yes (AP with CL=ONE) |
| Riak | Tunable (r/w/dw) | Yes (per-query) | Yes (AP by default) |
Decision rule: If your application needs strong consistency (banking, inventory), use single-primary or CL=ALL. If you can tolerate stale reads for lower latency, use AP systems.
Dimension 3: Query Model
| System | Query Interface | Secondary Indexes | Aggregation |
|---|---|---|---|
| Redis Cluster | Key-Value commands | No (use RedisSearch) | Lua scripts |
| MongoDB | Rich query language | Yes | Aggregation pipeline |
| Cassandra | CQL (SQL-like) | Yes (limited) | Allow filtering |
| Riak | Key-Value only | Yokozuna (add-on) | MapReduce (Riak Pipe) |
Decision rule: If you need complex queries, MongoDB wins. If you need predictable key-based access, any system works. Redis and Riak are KV-only — don’t use them for ad-hoc query workloads.
Dimension 4: Operational Model
| System | Point of Failure | Scaling | Repair |
|---|---|---|---|
| Redis Cluster | Primary per shard | Manual resharding | No repair (sync replication) |
| MongoDB | Primary per replica set | Shard key planning | Oplog window monitoring |
| Cassandra | None (peer-to-peer) | Linear (add nodes) | Repair needed regularly |
| Riak | None (peer-to-peer) | Linear (add vnodes) | Anti-entropy (AAE trees) |
Decision rule: If you have a small ops team, MongoDB’s operational tooling and cloud support are best. If you need to scale to 100+ nodes, Cassandra’s peer-to-peer design is most proven.
Key Takeaways
- Redis Cluster is best for caching/sessions where consistency matters and data fits in memory.
- Cassandra is best for high-throughput writes with predictable key access.
- MongoDB is best for rich query workloads with moderate write throughput.
- Riak is best for maximum availability with automatic conflict resolution via CRDTs.
Full Source
View or download the complete implementation: comparison.ts
Exercises
- You’re building a real-time leaderboard for a gaming platform. Which system do you choose? Why?
- You’re building a global e-commerce product catalog with hundreds of thousands of SKUs. Explain your choice.
- You’re building a distributed session store for 50 million users. Compare all four systems for this use case.
👁️ View Solutions
- Redis Cluster is the clear winner. Leaderboards use sorted sets (ZADD/ZRANGE), and Redis’s sorted set data structure is purpose-built for ranking. MongoDB could work but needs more code. Cassandra and Riak would require building the ranking externally. Redis Cluster handles the memory requirement with sharding.
- MongoDB is the best choice. Product catalogs need rich queries (search by category, price range, brand). MongoDB’s document model maps naturally to products (varying attributes). Cassandra could work if queries are pre-defined. Redis and Riak require building a separate search index. If the catalog is read-mostly and fits in memory, Redis with RedisSearch is also viable.
- Session store comparison: Redis Cluster wins on speed and simplicity (single-key lookups, TTL expiration). MongoDB has richer tooling but adds latency. Cassandra can handle 50M sessions but has unpredictable read latency (compaction). Riak’s vector clocks are unnecessary for sessions (LWW is fine). All four CAN work. Redis Cluster is the most common choice for this exact workload.
✏️ Exercises
Module 8: Comparison — Exercises
Exercise 1
Match the scenario to the recommended system:
| Scenario | System |
|---|---|
| A. Real-time chat (1M concurrent users, low latency) | ? |
| B. Product catalog (50K SKUs, faceted search) | ? |
| C. User sessions (50M users, TTL-based expiry) | ? |
| D. IoT sensor data (1M writes/sec, time-series) | ? |
| E. Shopping cart (high consistency, multi-device) | ? |
Exercise 2
Your team chose Cassandra for a new project. Six months later, reads are getting slower and disk usage is growing faster than expected. What’s likely happening? What do you check?
Exercise 3
Explain the trade-off between Redis Cluster’s “no conflicts” guarantee and its “asynchronous replication” behavior. How can a write be acknowledged and still lost?
Exercise 4
A managed cloud service (MongoDB Atlas, Amazon MemoryDB, Amazon Keyspaces) eliminates operational burden. Does this change the decision tree? When would you still run a system yourself?
👁️ View Solutions
-
A → Redis Cluster (pub/sub + sorted sets for presence, in-memory speed). B → MongoDB (rich query, faceted search, varying product attributes). C → Redis Cluster (TTL key expiration, sub-ms lookup). D → Cassandra (write throughput, time-series data model with clustering). E → Riak with CRDTs OR Redis with careful locking — Redis has no conflict resolution; Riak’s CRDT merge preserves all operations.
-
Tombstones. Deletes in Cassandra create tombstones that occupy space and slow down reads until compaction removes them. Check:
nodetool cfstats(look for dropped tombstones, SSTable count),nodetool compactionstats(compaction backlog), andnodetool tablestatsfor read latency. Fix: increase compaction throughput, adjustgc_grace_seconds, and run targeted repair to clear tombstones. -
Redis Cluster guarantees no conflicts within a shard because all operations are serialized through one primary. But the primary replicates asynchronously to its replica. If the primary accepts a write, acknowledges it to the client, and fails before the replica receives it, the write is lost. The “no conflict” guarantee is about concurrent access (two clients writing different values), not durability. Fix: use Redis with
WAITto ensure synchronous replication, or use a different system. -
Managed services change the decision tree significantly: MongoDB Atlas removes ops burden and makes MongoDB viable for more use cases. Amazon MemoryDB (Redis-compatible with durable replication) solves the async-replication-loss problem. Amazon Keyspaces (Cassandra-compatible) removes repair burden. Run your own only when: (a) Data sovereignty requires on-premise deployment. (b) Your throughput requirements exceed managed service limits. (c) Cost optimization (at sufficient scale, self-hosted is cheaper). (d) You need features the managed service doesn’t support.