System Design Fundamentals

The CAP Theorem

A

The CAP Theorem

Introduction: The Partition Dilemma

Imagine you’re the architect of a global financial system. Your distributed database spans two data centers: one on the US East Coast, another in Europe. Both are processing customer transactions constantly. Then, catastrophically, the network link between them fails—the cable is severed, the connection is gone.

Customers on both sides are still submitting requests. Your system faces an impossible choice:

Option A: Return stale data to keep serving customers (accepting that balances might be inconsistent across locations until the link recovers). This is availability—every request gets an answer.

Option B: Reject write requests until you can verify consistency with the other data center. Customers experience downtime, but you never show conflicting information. This is consistency—every read reflects the latest write.

You cannot do both during a network partition. This fundamental tension, formalized in the early 2000s, is the CAP Theorem—one of the most important principles in distributed systems design. It’s not a suggestion; it’s a mathematical truth that shapes every distributed database you’ll encounter.

In the previous chapter on database replication, we saw hints of these tensions. Now we name them, understand them deeply, and learn to design around them.

What is the CAP Theorem?

The CAP Theorem was articulated by Eric Brewer in 2000 as a conjecture, then rigorously proved by Seth Gilbert and Nancy Lynch in 2002. It states:

In a distributed system, you can guarantee at most two out of three properties:

  1. Consistency (C): Every read request returns the most recent write, or the system returns an error. In formal terms, the system maintains linearizability—all operations appear to execute in a total order, and each read observes all writes that happened before it.

  2. Availability (A): Every request to a non-failing node receives a response (success or failure, but not silence). The system never returns “I don’t know” due to internal delays or communication issues.

  3. Partition Tolerance (P): The system continues operating even when network partitions occur—when messages between nodes are lost or arbitrarily delayed.

Here’s the crucial insight: Partition Tolerance is non-negotiable in distributed systems. Network failures are not hypothetical; they happen constantly. You cannot design them away. Studies show partitions occur in real cloud systems multiple times per year, often unexpectedly.

Since P is required, the real choice in distributed systems is CP or AP:

  • CP (Consistency + Partition Tolerance): Prioritize consistency; sacrifice availability during partitions
  • AP (Availability + Partition Tolerance): Prioritize availability; accept eventual consistency

Common Misconception

CAP doesn’t mean you sacrifice one property entirely. It means you make behavioral choices during partitions. During normal operation (no partition), many systems provide strong consistency and availability. The theorem describes what you trade when the network fails.

The Bank Analogy: Making the Choice Real

Picture two bank branch offices in different cities, connected by a dedicated phone line. Each branch maintains its own ledger.

Scenario: The phone line fails. A customer walks into the East branch and wants to withdraw $1,000. The branch manager doesn’t know if the West branch has sufficient funds in the shared account.

CP Choice (Consistent + Partition-tolerant): The manager refuses the withdrawal until the phone line is restored. “I cannot verify this withdrawal against our consolidated balance. We cannot proceed.” The customer is unhappy, but the bank never risks overdrafts or inconsistent state.

AP Choice (Available + Partition-tolerant): The manager allows the withdrawal. The East branch records it immediately and allows the customer to leave. The West branch independently processes transactions. Later, when the phone line is restored, the branches reconcile—they might discover the account was overdrawn, and they handle it retroactively. The customer got immediate service, but consistency was deferred.

Both approaches are valid; the choice depends on your business priorities.

Formal Properties Under the Microscope

Let’s deepen our understanding of what these terms actually mean in a distributed system.

Consistency: Linearizability

When we say “consistency” in CAP, we mean strong consistency or linearizability. This is stricter than the “consistency” in ACID transactions.

Linearizability means:

  • There exists a total ordering of all operations
  • Each read reflects all writes that completed before it
  • If operation A finishes before operation B starts, A appears before B in the order

Example: Three clients are connected to a key-value store. Client 1 writes key=10 and receives acknowledgment. Client 2 then reads key and must see 10. Any read after Client 1’s write must include that write.

Availability: Non-blocking Responses

Availability in CAP is strict: every request to a non-failing node must terminate in a response (not an error, but a response—success or failure message).

Importantly:

  • The system never times out waiting for responses
  • A node that has crashed is “failing” and exempt
  • A node that’s partitioned from the majority might still receive requests but cannot guarantee consistency

Partition Tolerance: Network Reality

A partition means there’s a time interval during which some messages between partitions are lost. It’s not total failure—some messages might get through, but not reliably.

In practice:

  • A server hangs and stops responding (looks like a partition)
  • A network cable is cut
  • A cloud provider’s datacenter has routing issues
  • A software bug causes a service to queue requests indefinitely

The CAP Triangle: Visualizing the Trade-off

graph TB
    subgraph CAP["CAP Theorem"]
        C["Consistency<br/>(Linearizability)"]
        A["Availability<br/>(Always respond)"]
        P["Partition<br/>Tolerance<br/>(Network failures)"]
    end

    C -->|tension| A
    A -->|tension| P
    P -->|tension| C

    CP["<b>CP Systems</b><br/>Sacrifice Availability<br/>Examples: HBase, Spanner, etcd"]
    AP["<b>AP Systems</b><br/>Sacrifice Consistency<br/>Examples: Cassandra, DynamoDB"]

    style CP fill:#e8f5e9
    style AP fill:#e3f2fd
    style P fill:#fff9c4

Since P is required, you’re choosing which corner to sacrifice:

  • CP: Accept reduced availability (some requests fail/timeout) to maintain consistency
  • AP: Accept eventual consistency (stale reads) to maintain availability

CP Systems: Consistency First

CP systems prioritize keeping data correct across the network. When a partition occurs, they stop accepting writes rather than risk inconsistency.

How CP Systems Operate

CP systems use consensus protocols to ensure all replicas agree:

Raft Algorithm Example: Cassandra actually supports tunable consistency, but let’s use etcd as a pure CP example. Etcd uses the Raft consensus algorithm:

Leader receives write request
  → Leader appends to its log
  → Leader sends to followers
  → Waits for majority (n/2 + 1) to ACK
  → Commits to state machine
  → Sends success to client

If partition occurs:
  → Minority partition: Cannot achieve quorum, rejects writes
  → Majority partition: Continues normally

This ensures that once a write is acknowledged, all subsequent reads will see it.

Real-World CP Systems

SystemUse CaseTrade-off
HBaseLarge-scale consistent dataHigh latency, reduced availability during failures
Google SpannerGlobal transactions with strong consistencyExpensive infrastructure, higher write latency
etcdKubernetes configuration, service discoverySingle-digit MB/s throughput, not for large data
ZooKeeperDistributed coordinationSmall payload support, careful tuning required
MongoDB (with majority write concern)Document storage with ACID transactionsWrites slower, partition handling reduces availability

Decision Point: When to Choose CP

Choose CP when:

  • Data inconsistency is unacceptable (financial transfers, inventory management)
  • You can tolerate temporary unavailability (minutes of downtime acceptable)
  • You have reliable networking infrastructure

AP Systems: Availability First

AP systems prioritize responding to requests. When a partition occurs, both sides continue operating independently and reconcile later.

How AP Systems Operate

AP systems embrace eventual consistency: after the partition heals, all replicas converge to the same state.

Cassandra Replication Example:

Client writes to local coordinator
  → Coordinator immediately returns success
  → Sends write to replicas asynchronously
  → If partition blocks replicas, they receive write when partition heals

During partition:
  → Both sides process writes independently
  → Writes have timestamps or vector clocks
  → When partition heals, conflict resolution merges writes

Conflict Resolution Strategies

When two partitions diverge, AP systems must reconcile:

  1. Last-Write-Wins (LWW): Keep the write with the highest timestamp. Simple but can lose data.

    Partition A: key=100 at t=1000
    Partition B: key=200 at t=1005
    Reconcile: key=200 (B's timestamp is later)
  2. Vector Clocks: Track causality between writes.

    Write A: {A:1, B:0} → key=100
    Write B: {A:0, B:1} → key=200
    On reconcile: Both clocks are incomparable, conflict detected!
    Application decides resolution.
  3. CRDTs (Conflict-free Replicated Data Types): Special data structures that merge automatically without conflicts.

    Counter CRDT: Each replica has own counter, sum all for total
    Register CRDT: Biased merge function automatically resolves

Real-World AP Systems

SystemUse CaseTrade-off
CassandraTime-series data, high availabilityEventual consistency, tunable per query
DynamoDBServerless database, web-scaleEventual consistency by default, pay per request
CouchDBOffline-first, document replicationConflict handling complexity
RiakObject store, geographic distributionFewer abstractions, requires careful conflict handling

Tunable Consistency in Cassandra

Cassandra lets you choose consistency per query:

// Strong consistency (slow, but safe)
SELECT * FROM table USING CONSISTENCY QUORUM

// Fast but risky (might read old data)
SELECT * FROM table USING CONSISTENCY ONE

// Eventual consistency window typically milliseconds
SELECT * FROM table USING CONSISTENCY LOCAL_QUORUM

This flexibility is powerful but requires application awareness.

Beyond CAP: The PACELC Theorem

The CAP Theorem focuses on partition scenarios, but partitions are rare. Most of the time, your system has no partition. PACELC addresses this:

  • PAC: During a partition, choose between consistency and availability
  • ELC: Else (no partition), choose between latency and consistency
PACELC theorem states:
IF there is a partition
  THEN choose Consistency or Availability
ELSE (no partition)
  THEN choose Latency or Consistency

This is more practical because:

  • Partitions are rare (minutes per year)
  • Latency trade-offs are constant (every millisecond matters)

Database Classification under PACELC:

DatabasePartitionNo PartitionStrategy
SpannerCPCL (consistent, low latency, uses atomic clocks)Strong consistency always
DynamoDBAPAL (available, high latency to ensure replication)Availability-first
MongoDBCPCL (consistent, but sync replication causes latency)Consistency-first, accept latency
CassandraAPAL (available, tunable for latency/consistency)Availability-first, tune per-query

Practical Example: Cassandra During a Network Partition

Let’s walk through a real scenario using Cassandra (AP system):

Initial State:
  Node A (DC-East): user:balance = $1000
  Node B (DC-West): user:balance = $1000

Network partition occurs: A and B cannot communicate

Client in DC-East writes: user:balance = $900 (withdrawal of $100)
  → Node A accepts immediately
  → Tries to send to Node B, but partition blocks it
  → Client gets success

Simultaneously, Client in DC-West writes: user:balance = $850 (withdrawal of $150)
  → Node B accepts immediately
  → Tries to send to Node A, but partition blocks it
  → Client gets success

During partition:
  DC-East shows balance = $900
  DC-West shows balance = $850
  INCONSISTENCY! But both users got service.

Partition heals:
  Cassandra detects A and B have different data
  Last-Write-Wins resolves based on timestamp
  If East's write has ts=1000, West's has ts=1001:
    Final state: balance = $850
  Account lost $100! But the system recovered without manual intervention.

This is the trade-off of AP systems: you get availability but must handle these scenarios at the application level.

Decision Matrix: Choosing CP vs AP

Here’s how to think through the choice for different features:

FeaturePrimary ConcernCAP ChoiceReasoning
User authenticationCorrectnessCPWrong password = security breach, must be consistent
Activity feedFreshness tolerableAPUsers accept “you liked this” appearing seconds late
Shopping cartCorrectnessCPOverselling inventory is unacceptable
Product catalog viewsFreshness tolerableAPOutdated product count is acceptable
Payment processingCorrectnessCPMoney must be correct, no double-charging
User notificationsDelivery tolerableAPMissing a notification is acceptable

Notice: most applications need both CP and AP systems, using them for different purposes.

Trade-offs and Reality

The Cost of Strong Consistency (CP)

Pros:
  ✓ No conflicting data states
  ✓ Simpler application logic
  ✓ Easy to reason about correctness

Cons:
  ✗ Higher write latency (must replicate to quorum)
  ✗ Reduced availability (cannot serve during partitions)
  ✗ Higher infrastructure complexity (consensus protocols)
  ✗ Harder to scale geographically (quorum reads cross-continent = slow)

A write to Google Spanner might take 50-100ms to achieve global consistency. A write to a local Cassandra might take 1ms but could be stale.

The Cost of Eventual Consistency (AP)

Pros:
  ✓ Lower write latency (local writes)
  ✓ Higher availability (continue during partitions)
  ✓ Better geographic scalability
  ✓ Simpler replication logic

Cons:
  ✗ Temporary inconsistency windows
  ✗ Complex conflict resolution (LWW loses data, CRDTs are tricky)
  ✗ Application complexity (must handle stale reads)
  ✗ Debugging is harder (non-deterministic behavior)

Tunable Consistency: The Middle Ground

Most modern systems offer tunable consistency, allowing per-operation choices:

// Cassandra example
CassandraConsistencyLevel.ONE       // Fastest, most stale
CassandraConsistencyLevel.QUORUM    // Balanced
CassandraConsistencyLevel.ALL       // Strongest, slowest

// Dynamodb example
GetItemRequest(ConsistentRead=true)  // Strong consistency, slower
GetItemRequest(ConsistentRead=false) // Eventually consistent, faster

This lets you optimize each operation independently. Write critical data with QUORUM, read non-critical data with ONE.

Criticisms and Limitations of CAP

The CAP Theorem is foundational but imperfect:

  1. It’s a simplification: Real systems have more than three axes (durability, freshness, ordering, causality). PACELC adds nuance.

  2. Partition definition matters: What counts as a partition? If a server is slow but not crashed, does CAP apply? The theorem assumes binary: messages either arrive or don’t.

  3. Consistency definitions vary: CAP uses linearizability, but databases claim different consistency levels (serializability, snapshot isolation, read-your-writes). These are distinct from CAP consistency.

  4. Time isn’t accounted for: If a partition lasts 1 millisecond, does availability vs. consistency matter? PACELC is more realistic.

  5. It’s binary, not continuous: Real systems are often gradations—“how consistent,” “how available,” not either-or.

Despite these limitations, CAP remains the most important mental model for distributed systems trade-offs.

Key Takeaways

  • The CAP Theorem states you can guarantee at most two of: Consistency, Availability, and Partition Tolerance. Since partitions are inevitable, the choice is CP vs AP.

  • CP systems (HBase, Spanner, etcd) use consensus and quorum writes to prevent inconsistency. They sacrifice availability during partitions.

  • AP systems (Cassandra, DynamoDB) accept writes on all replicas and reconcile later. They sacrifice immediate consistency for availability and low latency.

  • Most applications need both CP and AP: use CP for critical data (auth, payments) and AP for non-critical data (feeds, analytics).

  • Tunable consistency (choosing consistency level per query) is the practical sweet spot, balancing latency, correctness, and availability.

  • The PACELC Theorem extends CAP by addressing both partition scenarios and normal operation, making it more practical for real-world systems.

  • CAP is a simplification; understand its context before applying it. Causality, durability, and ordering matter too.

Practice Scenarios

Scenario 1: Multi-Region Bank System

You’re designing a banking system for a global bank with datacenters in New York, London, and Singapore. Transfers between accounts in the same region are frequent. International transfers are less common but critical.

  • Which accounts should use CP? Why?
  • Which should use AP? Why?
  • How would you handle a partition between NY and London?
  • If you chose AP for some accounts, how would you prevent double-spending?

Scenario 2: E-commerce Inventory System

Your e-commerce platform has warehouses in 5 regions. Inventory must never be oversold (selling more units than exist). But stock updates should be fast—when an item sells, users should see the count decrement within 100ms.

  • Does this describe a CP or AP system?
  • What happens if a partition isolates one warehouse?
  • How would you reconcile inventory if two regions both sold the last unit?
  • Would you apply different consistency levels to reads vs. writes?

Scenario 3: Social Media Notification System

Your social network sends notifications when users interact (likes, comments). Users don’t mind if a notification arrives seconds late, but duplicates (same notification sent twice) are annoying.

  • Is this CP or AP?
  • During a partition, should both datacenters send notifications independently?
  • How would you prevent duplicate notifications during reconciliation?
  • Would you use eventually consistent or strongly consistent storage?

Coming Next: Strong vs. Eventual Consistency

Now that you understand the CAP theorem and why systems must choose, we’ll dive deeper into what strong consistency and eventual consistency actually mean in practice. We’ll explore consistency models (serializability, linearizability, causal consistency), quantify how stale data becomes, and examine patterns for living with eventual consistency without introducing bugs.