Distributed & Decentralized Systems Curriculum
Consistency Trade offs ยท CAP Theorem

Key Question

Which real-world systems choose CP, which choose AP, and why?

Deep Dive

Every distributed system must choose what breaks during a partition: consistency (CP) or availability (AP). Letโ€™s examine real systems on both sides.

CP Systems (Sacrifice Availability during partition)

CP systems choose to stop serving writes (or reads) when they canโ€™t guarantee consistency. They prefer correctness over uptime.

HBase:

  • Architecture: HBase uses HDFS for storage and ZooKeeper for coordination. It has a single active master and per-region servers.
  • During partition: If a region server is partitioned away from ZooKeeper, it stops serving data. The master might reassign its regions.
  • User experience: Queries to the partitioned region fail or timeout. The system is unavailable for that data.
  • Why CP? HBase is designed for analytic workloads (e.g., supporting a search index or recommendation system) where reading inconsistent data would produce wrong results.
Before partition:              During partition:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  HMaster    โ”‚                โ”‚  HMaster    โ”‚   โ”‚ (partition)  โ”‚
โ”‚  + ZK       โ”‚                โ”‚  + ZK       โ”‚   โ”‚             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ RegionServerโ”‚                โ”‚ RegionServerโ”‚   โ”‚ RegionServerโ”‚
โ”‚  (active)   โ”‚                โ”‚  (active)   โ”‚   โ”‚ (stopped!)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              All reads/writes   Denied service
                              go here            for data here

ZooKeeper:

  • Architecture: ZooKeeper uses ZAB (ZooKeeper Atomic Broadcast) with a leader and followers.
  • During partition: If the leader is isolated from the majority, the leader steps down. The majority elects a new leader. The isolated minority receives no updates and stops serving.
  • Why CP? ZooKeeper is a coordination service for distributed locks, configuration, and leader election. You cannot serve stale lock information โ€” that could cause split-brain (two processes both thinking they hold the lock).

AP Systems (Sacrifice Consistency during partition)

AP systems continue to serve all requests during a partition, even if some replicas return stale data.

Cassandra:

  • Architecture: Peer-to-peer with no single leader. Any node can accept writes.
  • During partition: Each side of the partition continues accepting reads and writes. Data diverges.
  • User experience: All requests succeed. Clients on different sides of the partition see different data temporarily.
  • Why AP? Cassandra was designed for always-available systems (e.g., time-series data, IoT sensor data) where collecting data is more important than ensuring every read is perfectly up-to-date.
During partition:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   //    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚Cassandraโ”‚~~~~~~~~โ”‚Cassandraโ”‚
โ”‚ Node A  โ”‚         โ”‚ Node B  โ”‚
โ”‚ Accepts โ”‚         โ”‚ Accepts โ”‚
โ”‚ writes  โ”‚         โ”‚ writes  โ”‚
โ”‚ X=1     โ”‚         โ”‚ X=2     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
   โ†‘                    โ†‘
Client A:            Client B:
writes X=1           writes X=2
          [both succeed!]

DynamoDB (eventual consistency mode):

  • Architecture: Multi-master with conflict resolution.
  • During partition: Both sides accept reads and writes.
  • Why AP? Amazonโ€™s core use case was shopping carts โ€” dropping requests risks losing sales. Serving slightly stale data (e.g., โ€œitem appears unavailable for 1 secondโ€) is better than throwing errors.

System Classification Diagram

                    Consistency (C)
                    โ”‚
          CP        โ”‚        CA (not possible in
         HBase      โ”‚        distributed systems)
         ZooKeeper  โ”‚
         etcd       โ”‚
         Redis      โ”‚
         (cluster)  โ”‚
                    โ”‚
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Availability (A)
                    โ”‚
                    โ”‚        AP
                    โ”‚        Cassandra
                    โ”‚        DynamoDB EC
                    โ”‚        Riak
                    โ”‚        CouchDB
                    โ”‚
               PC/EC         PA/EL
              (PACELC)      (PACELC)

Use Cases that Justify Each Choice

System TypeGood forBad for
CPFinancial transactions, locks, leader election, metadataHigh-availability web apps, real-time analytics
APSocial media, shopping carts, IoT, content deliverySystems requiring strict ordering

The choice should be driven by: what is more costly for your application โ€” serving stale data or serving no data?

Check Your Understanding

  1. Is HBase โ€œalwaysโ€ unavailable during a partition, or only for the partitioned data?
  2. Why does ZooKeeper need to be CP rather than AP?
  3. Can Cassandra be configured to behave like a CP system? If so, how?
  4. During a partition in DynamoDB (eventual consistency mode), what happens to writes on the minority side?

The โ€œSo What?โ€

When youโ€™re evaluating a database, one of the first questions should be: โ€œWhat does this system do when the network breaks?โ€ If the answer is โ€œstops servingโ€ (CP), you need retry logic and a tolerance for downtime. If the answer is โ€œkeeps serving with stale dataโ€ (AP), you need conflict resolution and a tolerance for temporary inconsistency. Thereโ€™s no right answer โ€” only what fits your application.


โœ๏ธ Exercises

CAP Theorem โ€” Exercises

Exercise 1

A traditional relational database (e.g., PostgreSQL) with a single primary and two asynchronous replicas experiences a network partition that isolates the primary from the replicas. Is this system CP or CA during the partition? Explain what happens.

Exercise 2

A system claims to provide โ€œeventual consistency.โ€ Can it also claim to provide CAP Consistency (linearizability)? Explain the relationship between eventual consistency and CAPโ€™s C.

Exercise 3

During a network partition, an AP system accepts writes on both sides of the partition. When the partition heals, those writes might conflict. Describe two approaches the system can use to resolve these conflicts and converge.

Exercise 4

A single-machine database with no replication experiences a hard drive failure that prevents any reads or writes. Does CAP apply to this scenario? Why or why not? What about a single-machine database that IS replicated (e.g., a single PostgreSQL instance that streams WAL to a standby)?

๐Ÿ‘๏ธ View Solutions

CAP Theorem โ€” Solutions

Solution 1

This system is neither cleanly CP nor AP โ€” itโ€™s a CP system that becomes unavailable during a partition.

Hereโ€™s what happens:

  • The primary is isolated from the replicas.
  • Writes to the primary succeed (no C โ€” replicas donโ€™t have them).
  • Reads from the primary return the latest data (A for primary-connected clients).
  • Reads from the replicas return stale data (C is violated โ€” replicas are behind the primary).
  • Read-only clients connected to replicas see stale data silently.

PostgreSQLโ€™s asynchronous replication does NOT detect or handle partitions explicitly. So during a partition:

  • The system is available (writes succeed on primary, reads on both sides) โ€” โœ“ A
  • But C is violated: replicas serve stale data โ€” โœ— C
  • During the partition, it behaves like AP (inconsistently)

However, if you consider the system as designed (it wasnโ€™t designed for partition tolerance), then when the partition heals, the replicas eventually catch up. But during the partition, it is NOT consistent.

If the replicas were SYNCHRONOUS (waiting for at least one replica to acknowledge), the system would be CP: the primary would block writes during the partition (since the replica canโ€™t acknowledge), and reads from the primary would be consistent but the primary would be effectively unavailable for writes.

Solution 2

No. Eventual consistency and CAP Consistency (linearizability) are different things.

  • CAP Consistency (linearizability) provides a real-time guarantee: once a write completes, every subsequent read must return that value.
  • Eventual consistency guarantees convergence only when writes stop. It does NOT guarantee that a read after a write returns the latest value.

You CANNOT claim CAP Consistency AND eventual consistency for the same operation at the same time. Hereโ€™s why:

Client A writes X=1 (completed)
Client B reads X (starts 1ms after A's write completed)
  - Under CAP-C: B must see X=1
  - Under EC: B might see X=0 (it will eventually converge, maybe after 100ms)

However, a system can provide both guarantees for DIFFERENT operations:

  • Strongly consistent (CAP-C) writes to the catalog table.
  • Eventually consistent reads to the user data table.

Or provide tunable consistency:

  • Cassandra: read at QUORUM โ†’ CAP-C, read at ONE โ†’ eventual consistency.

The key insight: eventual consistency is a WEAKER guarantee than CAP-C. They are on a spectrum, not two sides of a coin.

Solution 3

Approach 1: Last Writer Wins (LWW)

Each write is timestamped. When the partition heals and writes are compared, the write with the latest timestamp wins.

Side A: write X=1 at T=10:00:05
Side B: write X=2 at T=10:00:06

After healing: X=2 wins (T=10:00:06 > 10:00:05)

Simple and always converges. But Side Aโ€™s write (X=1) is silently lost.

Approach 2: Vector Clocks with Sibling Resolution

Each node maintains a version counter. After healing, conflicting versions are detected via vector clocks:

Side A: write X=1 โ†’ version ([A,1])
Side B: write X=2 โ†’ version ([B,1])

Neither dominates the other (A doesn't know about B's write, vice versa).
Resolution: both values are kept as siblings: {1, 2}

On next read, the application merges them:
- Shopping cart: union of items {milk} + {eggs} = {milk, eggs}
- Profile name: ask the user which one to keep
- Counter: sum (if appropriate) or max (if appropriate)

Both approaches are used in production. LWW for simplicity (most systems), siblings for data integrity (Dynamoโ€™s shopping cart).

Solution 4

Part 1: Single-machine database with hard drive failure

CAP does NOT apply to this scenario in the meaningful sense:

  • There is NO network (single machine).
  • Partition Tolerance (โ€œPโ€) is about network partitions โ€” the system cannot be split into groups that cannot communicate.
  • The hard drive failure is a node failure, not a partition.

A single-node system is effectively โ€œCAโ€ trivially:

  • C: Thereโ€™s only one copy of data, so itโ€™s always consistent.
  • A: Itโ€™s available as long as the node is up.
  • P: โ€œPartitionsโ€ are impossible because thereโ€™s only one node.

This is why CAP is only interesting for DISTRIBUTED (multi-node) systems.

Part 2: Single PostgreSQL instance with WAL streaming to standby

CAP DOES apply here because there are two nodes connected by a network:

  • If the primary fails and the standby isnโ€™t promoted: system becomes unavailable (no A).
  • If the network between primary and standby drops: the standby falls behind (no C if reads from standby).
  • If you configure synchronous replication: the primary blocks writes when the standby is unreachable (no A for writes).

So even a โ€œsimpleโ€ primary-standby setup is subject to CAP. The moment you have two nodes connected by a network, CAP applies.

๐ŸŽฎ Interactive CAP Theorem
Scroll to load interactive visualizationโ€ฆ