Distributed & Decentralized Systems Curriculum
Reflection Real Systems · Apache Cassandra

Key Question

When should Cassandra be your default choice for a distributed database?

Deep Dive

Cassandra occupies a specific niche in the distributed database landscape: high-volume writes with survivable consistency.

The Case for Cassandra

Cassandra excels in three specific scenarios:

1. Time-series data at scale

  • IoT sensor data: millions of devices write every second.
  • Cassandra’s append-heavy write model and TWCS compaction handle this perfectly.
  • Each write is a single append to the commit log + memtable. No reads, no seeks.

2. Multi-region deployments

  • Cassandra supports active-active replication across data centers.
  • Writes in US-EAST go to EU-WEST within seconds.
  • Each region accepts writes independently — no cross-region coordination during the write path.

3. High-availability requirements

  • No single point of failure. Any node can coordinate any operation.
  • Rolling upgrades are standard: take down nodes one at a time, the cluster keeps serving.

When NOT to Use Cassandra

  • Transactions across partitions: Cassandra does not support ACID transactions across rows.
  • Strongly consistent reads at high throughput: CL=ALL reads are slow. If you need consistent reads for 95% of queries, Cassandra is the wrong choice.
  • Ad-hoc queries: Cassandra’s query model requires careful index design. You cannot easily add a new query pattern without creating a new table or index.
  • Small datasets (<100 GB per node): Cassandra’s overhead (compaction, gossip, commit log) is significant. For small datasets, PostgreSQL with replication is simpler and faster.

The Operational Truth

Cassandra is not a set-it-and-forget-it database. Running it well requires:

  • Monitoring GC pause times (Cassandra is JVM-based)
  • Tuning compaction strategies per table
  • Managing token distribution (especially when adding nodes)
  • Handling “hot spots” where a single partition key absorbs too many writes

The “So What?”

Cassandra’s design philosophy: “availability first, consistency when you need it.” It is the production embodiment of the PA/EL quadrant. If your application can tolerate stale reads in exchange for always-available writes, Cassandra is the most battle-tested option in the ecosystem.

Full Source

View or download the complete implementation: cassandra.ts

Exercises

  1. A social media analytics platform ingests 100K events/second. Each event is a timestamp + user ID + metric. Would you recommend Cassandra? Why?
  2. Your startup needs a database for a multiplayer game leaderboard. Cassandra or Redis? Justify.
  3. What monitoring metrics would you track to detect Cassandra compaction issues before they cause an outage?

👁️ View Solutions

  1. Yes — this is the ideal Cassandra use case. Time-series data with high write volume. Use TWCS compaction with a 1-day window. Set RF=3 and CL=QUORUM for writes. Reads can be CL=ONE since analytics queries tolerate staleness.
  2. Cassandra is not the best choice for leaderboards. Redis’s sorted sets (ZADD/ZRANGE) provide sub-millisecond leaderboard operations. Cassandra would require a custom implementation using counters or a materialized view, with higher latency. Use Redis for the leaderboard and Cassandra for event persistence.
  3. Key metrics: (a) Pending compaction tasks — if this grows, nodes fall behind. (b) Disk space before/after compaction — if SSTables aren’t shrinking, compaction is stuck. (c) Read latency p99 — stale compaction causes reads to scan more SSTables. (d) GC pause time — compaction triggers GC, GC triggers node timeouts.

✏️ Exercises

Apache Cassandra — Exercises

Exercise 1

You have a 6-node Cassandra cluster with RF=3. Node A is the coordinator for a write to key K with CL=QUORUM. The replicas are Nodes B, C, D. Node C is down. How many nodes must acknowledge the write for it to succeed? Which nodes?

Exercise 2

Describe the sequence of events when a Cassandra read at CL=QUORUM discovers that one of the three replicas has a stale value. What happens to the stale replica? What happens to the client?

Exercise 3

Cassandra’s hinted handoff stores hints on the coordinator. If the coordinator fails before delivering the hints, what happens? How does this differ from Riak’s approach?

Exercise 4

A developer configures R+W ≤ RF (e.g., W=ONE, R=ONE with RF=3). They expect “eventual consistency.” Describe a scenario where a read returns a value that was explicitly overwritten 10 seconds ago.


👁️ View Solutions

  1. QUORUM = floor(3/2)+1 = 2 nodes. The write succeeds if 2 of the 3 replicas (B, C, D) acknowledge. Since C is down, the coordinator sends to B and D. If both are up: ok: true. The coordinator also stores a hint for C, to be delivered when C recovers.

  2. (a) The coordinator sends read requests to all RF replicas. (b) Two replicas return value=v2 @ ts=100. One returns value=v1 @ ts=50. (c) The coordinator sees ts=100 > ts=50 and returns v2 to the client. (d) In the background, the coordinator sends a write to the stale replica: SET key=v2 @ ts=100. (e) Subsequent reads from any replica return v2. The client sees the latest value immediately. The repair is asynchronous from the client’s perspective.

  3. The hints are lost. Cassandra’s hinted handoff is stored locally. If the coordinator crashes before delivering them, the data is gone. Riak’s approach is similar — both rely on the coordinator surviving. Neither provides durable hinted handoff. This is why read repair is essential: it catches the inconsistency on the next read.

  4. Scenario: A user updates their profile (SET name="Alice" → written to Node A only with CL=ONE). Before replication propagates, another user reads from Node B (which still has name="Alice (old)"). The read succeeds at CL=ONE (only Node B needs to respond) and returns the stale value. Even though the write was 10 seconds ago, gossip may not have propagated yet. This is “eventual” — there is no time guarantee.