Consistency Trade offs · PACELC

Key Question

What does CAP miss, and why do we need a better model?

Deep Dive

CAP is taught as the foundational trade-off in distributed systems. But it has a glaring blind spot: CAP only describes system behavior DURING a partition. What about the other 99.9% of the time when the network is working perfectly? CAP says nothing about normal-operation trade-offs.

This is where PACELC comes in. Proposed by Daniel Abadi in 2012, PACELC extends CAP by addressing the common-case trade-off between latency and consistency.

The Acronym

P — If there’s a Partition:

A vs C — trade-off between Availability and Consistency (same as CAP)

E — Else (no partition):

L vs C — trade-off between Latency and Consistency

PACELC forces you to answer TWO questions about your system:

During a partition: do you sacrifice Consistency or Availability?
During normal operation: do you sacrifice latency (extra coordination) or consistency (weaker guarantees)?

Why CAP Falls Short

Consider two AP systems: Cassandra and DynamoDB. Both sacrifice consistency during partitions. But in normal operation, Cassandra offers tunable consistency (read at ONE vs QUORUM vs ALL), and DynamoDB offers both eventually consistent reads (~5ms) and strongly consistent reads (~10ms). These are DIFFERENT design choices with different operational characteristics, yet CAP treats them identically.

CAP sees:
  Cassandra ─→ AP
  DynamoDB  ─→ AP
  (no distinction)

PACELC sees:
  Cassandra ─→ PA/EL  (Partition: Available / Else: Latency-prioritized)
  DynamoDB  ─→ PA/EL  (Partition: Available / Else: Latency-prioritized)
  (with nuance: consistency levels are tunable)

PACELC reveals that even within the AP category, there’s a rich design space. Some systems optimize for low latency (eventual consistency reads by default), while others optimize for consistency (strong reads by default, even if slower).

The Latency-Consistency Spectrum

In normal operation (no partition), the trade-off is:

Low Latency ────────────────────────────── High Latency
  (weak consistency)            (strong consistency)
       │                               │
  Eventual reads              Linearizable reads
  Read from one replica       Read from quorum
  No coordination             Coordination required
  Fast, may be stale          Slower, always current

The “E” in PACELC captures this reality: most of your operational decisions are about how much latency you’re willing to accept for stronger consistency.

Systems CAP vs PACELC Classification

System	CAP	PACELC
Cassandra	AP	PA/EL
DynamoDB	AP	PA/EL
MongoDB (p-S)	CP	PC/EC
Spanner	CP	PC/EC
ZooKeeper	CP	PC/EC
Riak	AP	PA/EL

The PACELC column provides more information. For example, both Spanner and ZooKeeper are CP under CAP, but PACELC adds:

Spanner: PC/EC — consistent during partition AND consistent during normal operation (via TrueTime).
ZooKeeper: PC/EC — consistent during partition AND consistent during normal operation (via ZAB).

These look the same. But in practice, ZooKeeper must serialize all writes through a leader (higher latency), while Spanner can parallelize more (lower latency per transaction). PACELC doesn’t capture this granularity, but it’s still more descriptive than CAP alone.

The Real Timeline

99.9% of time: No partition ────────── Some time: Partition ──── No partition
                    │                         │
           PACELC: E                    PACELC: P
           Trade-off: L vs C            Trade-off: A vs C
           
           "Should we read from         "Should we return stale
            one replica (fast) or        data (A) or no data (C)?"
            from a quorum (safe)?"

PACELC is strictly more informative than CAP because it covers both regimes.

Check Your Understanding

What does CAP not address that PACELC does?
Classify a system as PACELC: “Always consistent, and during a partition, it becomes unavailable.”
Why is the “E” in PACELC practically more relevant than the “P” for most system architects?

The “So What?”

PACELC is a more practical model than CAP for making real engineering decisions. When you choose a database, you’re not just choosing partition behavior — you’re choosing how it behaves 99.9% of the time. PACELC gives you the vocabulary to discuss both regimes. It also helps avoid the trap of thinking “CAP says my system is AP, so I don’t need to worry about consistency in normal operation.” No — AP only describes partition behavior. You still need to address the ELC trade-off.

✏️ Exercises

PACELC — Exercises

Exercise 1

Classify a single-node PostgreSQL database (no replication) using PACELC. Explain your reasoning for each part of the acronym.

Exercise 2

What PACELC classification does a system built on the Raft consensus algorithm provide? Explain by answering both the “P” and “E” parts of the acronym.

Exercise 3

Is a PC/EL system theoretically possible? Explain why or why not. What practical challenges would such a system face?

Exercise 4

You’re building a real-time bidding system for online ads. Bids must be processed in under 50ms or they’re rejected. During a partition, you can afford to lose some bid data (within reason) but cannot afford to drop incoming requests. Which PACELC class do you choose? Which database(s) would you consider? Justify your answer.

👁️ View Solutions

PACELC — Solutions

Solution 1

Single-node PostgreSQL: N/A (PACELC doesn’t meaningfully apply)

PACELC is designed for distributed (multi-node) systems. A single-node database:

P: There are NO replicas, so network partitions are impossible. The “P” condition never triggers.
E (Else): There’s only one copy of the data, so there’s no need to coordinate with other nodes. Reads and writes are trivially consistent and low latency.

Some might argue it’s trivially PC/EC: the single node is “consistent” (one copy = always consistent) and “consistent” in normal operation (same reasoning). But this misses the point — PACELC describes how multi-node systems behave when they need to copy data between nodes. A single-node system doesn’t face these trade-offs.

If forced to classify: it’s a degenerate case. It is both “consistent” (one copy) and “low latency” (no coordination), but it lacks fault tolerance — if the node dies, everything dies. PACELC doesn’t capture this availability/durability concern.

Solution 2

Raft provides PC/EC.

P — Partition (During a partition):

Raft maintains a leader. If the leader is in the majority partition, it continues to operate. The minority partition cannot elect a leader (needs majority), so it cannot accept writes. The minority becomes unavailable for writes.

Raft sacrifices Availability during a partition → PC (Consistency is chosen).

If the leader is in the MINORITY partition (isolated from the majority), the leader steps down. The majority elects a new leader. Writes to the old leader are lost. But from the perspective of the system as a whole: the majority partition remains consistent, and the minority rejects writes. This is CP behavior.

E — Else (Normal operation):

Raft is designed for strong consistency. All reads and writes go through the leader. Write requests are replicated to a majority before being acknowledged. Reads can also go through the leader (guaranteed consistent) or use read-index/lease mechanisms (consistent with some optimization).

Raft provides EC (strong consistency) during normal operation, at the cost of some latency (two round trips for writes: leader → followers → leader acknowledgment).

Raft provides PC/EC. This is exactly what systems like etcd, ZooKeeper (ZAB, which is Raft-like), and Consul provide.

Solution 3

PC/EL is theoretically impossible. Here’s why:

PC means: during a partition, the system chooses Consistency over Availability. To be consistent, nodes must coordinate (e.g., via quorum, leader-based replication).
EL means: in normal operation (no partition), the system chooses Latency over Consistency — it uses fast, uncoordinated operations that might return stale data.

The contradiction:

If you’re willing to accept stale data in normal operation (EL), why would you enforce strong consistency during a partition (PC)? The partition is precisely when maintaining consistency is HARDEST (network is broken). If you don’t enforce consistency when the network works, you certainly can’t enforce it when the network is broken.

More formally:

EL implies that in normal operation, you use techniques like reading from one replica (no quorum coordination).
But for PC, during a partition, you need quorum-based coordination to maintain consistency.
The system would need to somehow “switch” from non-coordinated mode to coordinated mode during a partition. But a partition is the worst time to switch to a coordination-heavy mode — you can’t coordinate because the network is broken!

Practical attempt at PC/EL:

You might try: “Use fast uncoordinated reads normally, but during a partition, block all reads.” That’s not PC/EL — blocking reads is not EL (it’s not low latency).

You might try: “Pre-configure all nodes with a consistent snapshot, so reads during a partition are fast AND consistent.” But this requires coordination to PREPARE the snapshot — you’ve just moved the coordination to a different time.

PC/EL is a contradiction in terms. Consistency requires coordination, and coordination adds latency. You can minimize the latency through optimization, but you cannot eliminate it. Systems like Spanner are PC/EC — they accept the latency cost for consistency.

Solution 4

Recommended class: PA/EL

Reasoning:

The constraints are:

Latency SLA < 50ms → The system must be fast. PC/EC systems like Spanner add 10-50ms for coordination, eating the entire budget.
Cannot drop requests during partition → The system must be Available.
Can afford to lose some bid data → Inconsistency is tolerable (within reason).

Combining these: PA (must be available during partition) + EL (must be fast in normal operation = eventual consistency).

Database choices:

Cassandra (CL=ONE for both reads and writes):
- Writes are fast (commit log + memtable, acknowledged immediately).
- Reads are fast (read from one replica).
- During partition, continues accepting writes.
- Downside: bids might be lost in conflicts (LWW resolution).
- Mitigation: use client-side timestamps and accept some data loss.
Redis Cluster (async replication):
- Extremely low latency (in-memory).
- During partition, each side continues.
- Downside: data loss is possible (async replication).
- Acceptable for bidding (losing a bid = lost revenue, but better than dropping all bids).
Custom in-memory cache + async write to durable store:
- Keep bid state in a distributed cache (Redis/Memcached).
- Async writes to Cassandra/PostgreSQL for durability.
- Trade-off: bid data is ephemeral; losing recent bids is OK.
- This is a common real-world RTB architecture.

Why NOT PC/EC:

Spanner: 10-50ms latency is too high for the 50ms SLA.
ZooKeeper/etcd: too slow (all writes through leader).
MongoDB (primary reads): a single primary becomes a bottleneck and latency risk.

Best answer: PA/EL, implemented with Cassandra (CL=ONE) or Redis Cluster. Accept some data loss as a trade-off for speed and availability.