The CAP Theorem
Introduction: The Partition Dilemma
Imagine you’re the architect of a global financial system. Your distributed database spans two data centers: one on the US East Coast, another in Europe. Both are processing customer transactions constantly. Then, catastrophically, the network link between them fails—the cable is severed, the connection is gone.
Customers on both sides are still submitting requests. Your system faces an impossible choice:
Option A: Return stale data to keep serving customers (accepting that balances might be inconsistent across locations until the link recovers). This is availability—every request gets an answer.
Option B: Reject write requests until you can verify consistency with the other data center. Customers experience downtime, but you never show conflicting information. This is consistency—every read reflects the latest write.
You cannot do both during a network partition. This fundamental tension, formalized in the early 2000s, is the CAP Theorem—one of the most important principles in distributed systems design. It’s not a suggestion; it’s a mathematical truth that shapes every distributed database you’ll encounter.
In the previous chapter on database replication, we saw hints of these tensions. Now we name them, understand them deeply, and learn to design around them.
What is the CAP Theorem?
The CAP Theorem was articulated by Eric Brewer in 2000 as a conjecture, then rigorously proved by Seth Gilbert and Nancy Lynch in 2002. It states:
In a distributed system, you can guarantee at most two out of three properties:
-
Consistency (C): Every read request returns the most recent write, or the system returns an error. In formal terms, the system maintains linearizability—all operations appear to execute in a total order, and each read observes all writes that happened before it.
-
Availability (A): Every request to a non-failing node receives a response (success or failure, but not silence). The system never returns “I don’t know” due to internal delays or communication issues.
-
Partition Tolerance (P): The system continues operating even when network partitions occur—when messages between nodes are lost or arbitrarily delayed.
Here’s the crucial insight: Partition Tolerance is non-negotiable in distributed systems. Network failures are not hypothetical; they happen constantly. You cannot design them away. Studies show partitions occur in real cloud systems multiple times per year, often unexpectedly.
Since P is required, the real choice in distributed systems is CP or AP:
- CP (Consistency + Partition Tolerance): Prioritize consistency; sacrifice availability during partitions
- AP (Availability + Partition Tolerance): Prioritize availability; accept eventual consistency
Common Misconception
CAP doesn’t mean you sacrifice one property entirely. It means you make behavioral choices during partitions. During normal operation (no partition), many systems provide strong consistency and availability. The theorem describes what you trade when the network fails.
The Bank Analogy: Making the Choice Real
Picture two bank branch offices in different cities, connected by a dedicated phone line. Each branch maintains its own ledger.
Scenario: The phone line fails. A customer walks into the East branch and wants to withdraw $1,000. The branch manager doesn’t know if the West branch has sufficient funds in the shared account.
CP Choice (Consistent + Partition-tolerant): The manager refuses the withdrawal until the phone line is restored. “I cannot verify this withdrawal against our consolidated balance. We cannot proceed.” The customer is unhappy, but the bank never risks overdrafts or inconsistent state.
AP Choice (Available + Partition-tolerant): The manager allows the withdrawal. The East branch records it immediately and allows the customer to leave. The West branch independently processes transactions. Later, when the phone line is restored, the branches reconcile—they might discover the account was overdrawn, and they handle it retroactively. The customer got immediate service, but consistency was deferred.
Both approaches are valid; the choice depends on your business priorities.
Formal Properties Under the Microscope
Let’s deepen our understanding of what these terms actually mean in a distributed system.
Consistency: Linearizability
When we say “consistency” in CAP, we mean strong consistency or linearizability. This is stricter than the “consistency” in ACID transactions.
Linearizability means:
- There exists a total ordering of all operations
- Each read reflects all writes that completed before it
- If operation A finishes before operation B starts, A appears before B in the order
Example: Three clients are connected to a key-value store. Client 1 writes key=10 and receives acknowledgment. Client 2 then reads key and must see 10. Any read after Client 1’s write must include that write.
Availability: Non-blocking Responses
Availability in CAP is strict: every request to a non-failing node must terminate in a response (not an error, but a response—success or failure message).
Importantly:
- The system never times out waiting for responses
- A node that has crashed is “failing” and exempt
- A node that’s partitioned from the majority might still receive requests but cannot guarantee consistency
Partition Tolerance: Network Reality
A partition means there’s a time interval during which some messages between partitions are lost. It’s not total failure—some messages might get through, but not reliably.
In practice:
- A server hangs and stops responding (looks like a partition)
- A network cable is cut
- A cloud provider’s datacenter has routing issues
- A software bug causes a service to queue requests indefinitely
The CAP Triangle: Visualizing the Trade-off
graph TB
subgraph CAP["CAP Theorem"]
C["Consistency<br/>(Linearizability)"]
A["Availability<br/>(Always respond)"]
P["Partition<br/>Tolerance<br/>(Network failures)"]
end
C -->|tension| A
A -->|tension| P
P -->|tension| C
CP["<b>CP Systems</b><br/>Sacrifice Availability<br/>Examples: HBase, Spanner, etcd"]
AP["<b>AP Systems</b><br/>Sacrifice Consistency<br/>Examples: Cassandra, DynamoDB"]
style CP fill:#e8f5e9
style AP fill:#e3f2fd
style P fill:#fff9c4
Since P is required, you’re choosing which corner to sacrifice:
- CP: Accept reduced availability (some requests fail/timeout) to maintain consistency
- AP: Accept eventual consistency (stale reads) to maintain availability
CP Systems: Consistency First
CP systems prioritize keeping data correct across the network. When a partition occurs, they stop accepting writes rather than risk inconsistency.
How CP Systems Operate
CP systems use consensus protocols to ensure all replicas agree:
Raft Algorithm Example: Cassandra actually supports tunable consistency, but let’s use etcd as a pure CP example. Etcd uses the Raft consensus algorithm:
Leader receives write request
→ Leader appends to its log
→ Leader sends to followers
→ Waits for majority (n/2 + 1) to ACK
→ Commits to state machine
→ Sends success to client
If partition occurs:
→ Minority partition: Cannot achieve quorum, rejects writes
→ Majority partition: Continues normally
This ensures that once a write is acknowledged, all subsequent reads will see it.
Real-World CP Systems
| System | Use Case | Trade-off |
|---|---|---|
| HBase | Large-scale consistent data | High latency, reduced availability during failures |
| Google Spanner | Global transactions with strong consistency | Expensive infrastructure, higher write latency |
| etcd | Kubernetes configuration, service discovery | Single-digit MB/s throughput, not for large data |
| ZooKeeper | Distributed coordination | Small payload support, careful tuning required |
| MongoDB (with majority write concern) | Document storage with ACID transactions | Writes slower, partition handling reduces availability |
Decision Point: When to Choose CP
Choose CP when:
- Data inconsistency is unacceptable (financial transfers, inventory management)
- You can tolerate temporary unavailability (minutes of downtime acceptable)
- You have reliable networking infrastructure
AP Systems: Availability First
AP systems prioritize responding to requests. When a partition occurs, both sides continue operating independently and reconcile later.
How AP Systems Operate
AP systems embrace eventual consistency: after the partition heals, all replicas converge to the same state.
Cassandra Replication Example:
Client writes to local coordinator
→ Coordinator immediately returns success
→ Sends write to replicas asynchronously
→ If partition blocks replicas, they receive write when partition heals
During partition:
→ Both sides process writes independently
→ Writes have timestamps or vector clocks
→ When partition heals, conflict resolution merges writes
Conflict Resolution Strategies
When two partitions diverge, AP systems must reconcile:
-
Last-Write-Wins (LWW): Keep the write with the highest timestamp. Simple but can lose data.
Partition A: key=100 at t=1000 Partition B: key=200 at t=1005 Reconcile: key=200 (B's timestamp is later) -
Vector Clocks: Track causality between writes.
Write A: {A:1, B:0} → key=100 Write B: {A:0, B:1} → key=200 On reconcile: Both clocks are incomparable, conflict detected! Application decides resolution. -
CRDTs (Conflict-free Replicated Data Types): Special data structures that merge automatically without conflicts.
Counter CRDT: Each replica has own counter, sum all for total Register CRDT: Biased merge function automatically resolves
Real-World AP Systems
| System | Use Case | Trade-off |
|---|---|---|
| Cassandra | Time-series data, high availability | Eventual consistency, tunable per query |
| DynamoDB | Serverless database, web-scale | Eventual consistency by default, pay per request |
| CouchDB | Offline-first, document replication | Conflict handling complexity |
| Riak | Object store, geographic distribution | Fewer abstractions, requires careful conflict handling |
Tunable Consistency in Cassandra
Cassandra lets you choose consistency per query:
// Strong consistency (slow, but safe)
SELECT * FROM table USING CONSISTENCY QUORUM
// Fast but risky (might read old data)
SELECT * FROM table USING CONSISTENCY ONE
// Eventual consistency window typically milliseconds
SELECT * FROM table USING CONSISTENCY LOCAL_QUORUM
This flexibility is powerful but requires application awareness.
Beyond CAP: The PACELC Theorem
The CAP Theorem focuses on partition scenarios, but partitions are rare. Most of the time, your system has no partition. PACELC addresses this:
- PAC: During a partition, choose between consistency and availability
- ELC: Else (no partition), choose between latency and consistency
PACELC theorem states:
IF there is a partition
THEN choose Consistency or Availability
ELSE (no partition)
THEN choose Latency or Consistency
This is more practical because:
- Partitions are rare (minutes per year)
- Latency trade-offs are constant (every millisecond matters)
Database Classification under PACELC:
| Database | Partition | No Partition | Strategy |
|---|---|---|---|
| Spanner | CP | CL (consistent, low latency, uses atomic clocks) | Strong consistency always |
| DynamoDB | AP | AL (available, high latency to ensure replication) | Availability-first |
| MongoDB | CP | CL (consistent, but sync replication causes latency) | Consistency-first, accept latency |
| Cassandra | AP | AL (available, tunable for latency/consistency) | Availability-first, tune per-query |
Practical Example: Cassandra During a Network Partition
Let’s walk through a real scenario using Cassandra (AP system):
Initial State:
Node A (DC-East): user:balance = $1000
Node B (DC-West): user:balance = $1000
Network partition occurs: A and B cannot communicate
Client in DC-East writes: user:balance = $900 (withdrawal of $100)
→ Node A accepts immediately
→ Tries to send to Node B, but partition blocks it
→ Client gets success
Simultaneously, Client in DC-West writes: user:balance = $850 (withdrawal of $150)
→ Node B accepts immediately
→ Tries to send to Node A, but partition blocks it
→ Client gets success
During partition:
DC-East shows balance = $900
DC-West shows balance = $850
INCONSISTENCY! But both users got service.
Partition heals:
Cassandra detects A and B have different data
Last-Write-Wins resolves based on timestamp
If East's write has ts=1000, West's has ts=1001:
Final state: balance = $850
Account lost $100! But the system recovered without manual intervention.
This is the trade-off of AP systems: you get availability but must handle these scenarios at the application level.
Decision Matrix: Choosing CP vs AP
Here’s how to think through the choice for different features:
| Feature | Primary Concern | CAP Choice | Reasoning |
|---|---|---|---|
| User authentication | Correctness | CP | Wrong password = security breach, must be consistent |
| Activity feed | Freshness tolerable | AP | Users accept “you liked this” appearing seconds late |
| Shopping cart | Correctness | CP | Overselling inventory is unacceptable |
| Product catalog views | Freshness tolerable | AP | Outdated product count is acceptable |
| Payment processing | Correctness | CP | Money must be correct, no double-charging |
| User notifications | Delivery tolerable | AP | Missing a notification is acceptable |
Notice: most applications need both CP and AP systems, using them for different purposes.
Trade-offs and Reality
The Cost of Strong Consistency (CP)
Pros:
✓ No conflicting data states
✓ Simpler application logic
✓ Easy to reason about correctness
Cons:
✗ Higher write latency (must replicate to quorum)
✗ Reduced availability (cannot serve during partitions)
✗ Higher infrastructure complexity (consensus protocols)
✗ Harder to scale geographically (quorum reads cross-continent = slow)
A write to Google Spanner might take 50-100ms to achieve global consistency. A write to a local Cassandra might take 1ms but could be stale.
The Cost of Eventual Consistency (AP)
Pros:
✓ Lower write latency (local writes)
✓ Higher availability (continue during partitions)
✓ Better geographic scalability
✓ Simpler replication logic
Cons:
✗ Temporary inconsistency windows
✗ Complex conflict resolution (LWW loses data, CRDTs are tricky)
✗ Application complexity (must handle stale reads)
✗ Debugging is harder (non-deterministic behavior)
Tunable Consistency: The Middle Ground
Most modern systems offer tunable consistency, allowing per-operation choices:
// Cassandra example
CassandraConsistencyLevel.ONE // Fastest, most stale
CassandraConsistencyLevel.QUORUM // Balanced
CassandraConsistencyLevel.ALL // Strongest, slowest
// Dynamodb example
GetItemRequest(ConsistentRead=true) // Strong consistency, slower
GetItemRequest(ConsistentRead=false) // Eventually consistent, faster
This lets you optimize each operation independently. Write critical data with QUORUM, read non-critical data with ONE.
Criticisms and Limitations of CAP
The CAP Theorem is foundational but imperfect:
-
It’s a simplification: Real systems have more than three axes (durability, freshness, ordering, causality). PACELC adds nuance.
-
Partition definition matters: What counts as a partition? If a server is slow but not crashed, does CAP apply? The theorem assumes binary: messages either arrive or don’t.
-
Consistency definitions vary: CAP uses linearizability, but databases claim different consistency levels (serializability, snapshot isolation, read-your-writes). These are distinct from CAP consistency.
-
Time isn’t accounted for: If a partition lasts 1 millisecond, does availability vs. consistency matter? PACELC is more realistic.
-
It’s binary, not continuous: Real systems are often gradations—“how consistent,” “how available,” not either-or.
Despite these limitations, CAP remains the most important mental model for distributed systems trade-offs.
Key Takeaways
-
The CAP Theorem states you can guarantee at most two of: Consistency, Availability, and Partition Tolerance. Since partitions are inevitable, the choice is CP vs AP.
-
CP systems (HBase, Spanner, etcd) use consensus and quorum writes to prevent inconsistency. They sacrifice availability during partitions.
-
AP systems (Cassandra, DynamoDB) accept writes on all replicas and reconcile later. They sacrifice immediate consistency for availability and low latency.
-
Most applications need both CP and AP: use CP for critical data (auth, payments) and AP for non-critical data (feeds, analytics).
-
Tunable consistency (choosing consistency level per query) is the practical sweet spot, balancing latency, correctness, and availability.
-
The PACELC Theorem extends CAP by addressing both partition scenarios and normal operation, making it more practical for real-world systems.
-
CAP is a simplification; understand its context before applying it. Causality, durability, and ordering matter too.
Practice Scenarios
Scenario 1: Multi-Region Bank System
You’re designing a banking system for a global bank with datacenters in New York, London, and Singapore. Transfers between accounts in the same region are frequent. International transfers are less common but critical.
- Which accounts should use CP? Why?
- Which should use AP? Why?
- How would you handle a partition between NY and London?
- If you chose AP for some accounts, how would you prevent double-spending?
Scenario 2: E-commerce Inventory System
Your e-commerce platform has warehouses in 5 regions. Inventory must never be oversold (selling more units than exist). But stock updates should be fast—when an item sells, users should see the count decrement within 100ms.
- Does this describe a CP or AP system?
- What happens if a partition isolates one warehouse?
- How would you reconcile inventory if two regions both sold the last unit?
- Would you apply different consistency levels to reads vs. writes?
Scenario 3: Social Media Notification System
Your social network sends notifications when users interact (likes, comments). Users don’t mind if a notification arrives seconds late, but duplicates (same notification sent twice) are annoying.
- Is this CP or AP?
- During a partition, should both datacenters send notifications independently?
- How would you prevent duplicate notifications during reconciliation?
- Would you use eventually consistent or strongly consistent storage?
Coming Next: Strong vs. Eventual Consistency
Now that you understand the CAP theorem and why systems must choose, we’ll dive deeper into what strong consistency and eventual consistency actually mean in practice. We’ll explore consistency models (serializability, linearizability, causal consistency), quantify how stale data becomes, and examine patterns for living with eventual consistency without introducing bugs.