System Design Fundamentals

Strong vs Eventual Consistency

A

Strong vs Eventual Consistency

The Inconsistency Problem

Imagine you’re scrolling through your social media feed and update your profile picture. You refresh the page. The old picture stares back at you. You refresh again—there’s the new one! Relief. But then you refresh once more and… the old picture is back. Your mind spins: “Did my update work or not?”

Now contrast this with your banking app. You transfer $500 to a friend. The transaction processes. You check your balance immediately—it’s guaranteed to reflect the latest state. No confusion, no inconsistency. You don’t have to refresh five times to trust the system.

These two experiences represent opposite ends of the consistency spectrum, and choosing between them is one of the most consequential decisions in distributed systems design. This choice directly flows from the CAP theorem we explored earlier: as your system scales across multiple machines and networks, you must decide whether to prioritize consistency or availability when failures occur.

In this chapter, we’ll dissect the mechanisms that enable each approach, understand the real costs involved, and learn how to choose wisely for your specific use cases.

What Is Strong Consistency?

Strong consistency, formally known as linearizability, provides the illusion of a single authoritative copy of your data. Here’s the guarantee: once a write operation completes successfully, every subsequent read—no matter which replica or node services it—will see that write or a later one.

Think of it this way: the system behaves as if there’s a global ordering of all operations, and everyone observes that total order. There’s no window of confusion. No stale data surprises.

Achieving this requires coordination. Here’s the internal machinery:

Consensus Protocols: Systems like Raft, Paxos, and ZAB (used by Apache ZooKeeper) ensure that all nodes agree on the value before acknowledging a write. When you write to a strong-consistency system:

  1. The write reaches a coordinator (often called the leader)
  2. The coordinator sends it to all replicas
  3. It waits for acknowledgment from a majority (quorum)
  4. Only then does it tell the client “write successful”
  5. Reads are served from any replica, because all replicas have the latest value

The Cost: This coordination is expensive. Your write latency increases dramatically, especially across geographic regions where network round trips take hundreds of milliseconds. And during network partitions—when the leader can’t reach a majority of nodes—the system cannot accept new writes. Availability drops.

What Is Eventual Consistency?

Eventual consistency is fundamentally optimistic. The system promises that if you stop making updates, all replicas will eventually converge to the same value. But in the meantime, different clients might read different values.

The write path is dramatically simpler:

  1. The write reaches any node
  2. That node acknowledges success immediately
  3. The write is asynchronously propagated to other replicas
  4. Eventually, all replicas hold the same data

This is fast and available. During network partitions, every replica can still accept writes. But you pay the price in consistency: reads might return stale data.

Convergence Mechanisms: How does “eventual” actually happen?

  • Read Repair: When you read stale data, the system detects the staleness (through version metadata) and updates the replica with the newer value
  • Hinted Handoff: If a replica is unreachable, another node temporarily stores updates and forwards them once the replica recovers
  • Anti-Entropy: Background processes periodically compare replicas using Merkle trees to identify and fix divergences
  • Conflict Resolution: When concurrent writes occur (write A on replica 1, write B on replica 2), the system picks a winner—by timestamp, vector clock, or application logic

The convergence window depends on network delays and how aggressively you run background processes. A well-tuned system might converge within milliseconds; a neglected one might take hours.

The Consistency Spectrum

Not everything is binary. The real world operates along a spectrum:

Consistency LevelGuaranteeUse Case
StrongAll reads see the latest writeFinancial transactions, user authentication
Read-Your-WritesYou see your own updates immediately; others might notProfile updates, your own posts in social media
Monotonic ReadsIf you read value V1, subsequent reads never go backward to older valuesViewing message thread order
Monotonic WritesYour writes are applied in the order you issued themDatabase transactions, ordered operations
Bounded StalenessReads lag behind writes by at most t seconds (or n updates)Analytics dashboards, reporting systems
EventualConvergence guaranteed eventually; no boundsSocial media likes, view counts

Many systems support tunable consistency: the ability to choose per-query. You might read with strong consistency when displaying sensitive data and eventual consistency when showing aggregates.

Real-World Analogy

Imagine a major news story breaks:

  • Strong Consistency: Like a live television broadcast. Everyone watching at the same moment sees the exact same information. But the broadcast equipment is expensive and breaks down if the network gets choppy.
  • Eventual Consistency: Like a traditional newspaper. The printing press publishes the story once, then newspapers get distributed across the city. Different neighborhoods receive them at different times. Some people read the old edition, some the new. But eventually, everyone has the latest version.
  • Bounded Staleness: Like a news website with a published update timestamp. You know the information is at most 5 minutes old, good enough for most purposes.

How Strong Consistency Works Internally

Let’s peer inside a strongly consistent system like etcd or Consul:

A write to a strongly consistent key-value store follows this timeline:

sequenceDiagram
    participant Client
    participant Leader
    participant Replica1
    participant Replica2
    participant Replica3

    Client->>Leader: Write(key="balance", value=500)
    Leader->>Replica1: AppendLog(entry)
    Leader->>Replica2: AppendLog(entry)
    Leader->>Replica3: AppendLog(entry)

    Replica1->>Leader: Ack
    Replica2->>Leader: Ack
    Note over Leader: Majority (3/5) achieved

    Leader->>Client: Write successful

    Client->>Replica1: Read(key="balance")
    Replica1->>Client: 500

The system waits for a quorum (majority) before confirming. With 5 nodes, you need 3 confirmations. This means:

  • You can tolerate 2 node failures
  • During network partitions, only the partition containing the majority can serve writes
  • Every replica knows it can safely trust data committed by the leader

The trade-off: your write latency is the 99th percentile latency to the slowest node in your quorum. If you have replicas spread across continents, a single write might take 300+ milliseconds.

How Eventual Consistency Works Internally

Now observe an eventually consistent system like Cassandra or DynamoDB:

sequenceDiagram
    participant Client
    participant Node_A as Node A
    participant Node_B as Node B
    participant Node_C as Node C

    Client->>Node_A: Write(key="likes", value=42)
    Note over Node_A: Store locally<br/>with version
    Node_A->>Client: Write successful (immediate)

    Note over Node_A,Node_C: Background replication
    Node_A->>Node_B: Replicate(key, value, version)
    Node_A->>Node_C: Replicate(key, value, version)

    Note over Node_A: ~100ms later
    Client->>Node_B: Read(key="likes")
    Node_B->>Client: 42 ✓ (or stale value if replication hasn't arrived)

The write returns immediately after a single node acknowledges. Replication happens in the background, introducing a small window where reads from other nodes might return stale data.

Pro Tip: In Cassandra, you can tune the replication timeout. If your network is slow, increase it to ensure most writes propagate before a read arrives. But if you set it too high, write throughput suffers.

Consistency Levels in Production Systems

Let’s see how real databases implement this:

Cassandra: CQL Consistency Levels

-- Strongly consistent: wait for all replicas
WRITE data USING CONSISTENCY ALL;
READ data USING CONSISTENCY ALL;

-- Balanced: wait for a majority
WRITE data USING CONSISTENCY QUORUM;
READ data USING CONSISTENCY QUORUM;

-- Fast but risky: any single replica
WRITE data USING CONSISTENCY ONE;
READ data USING CONSISTENCY ONE;

-- Local quorum: majority within your datacenter only
WRITE data USING CONSISTENCY LOCAL_QUORUM;
READ data USING CONSISTENCY LOCAL_QUORUM;

By setting CONSISTENCY QUORUM for both reads and writes, you achieve read-your-writes consistency. Each operation waits for 2 out of 3 replicas, forming overlapping quorums.

DynamoDB: Consistency Options

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('UserProfiles')

# Eventually consistent read (default, fast)
response = table.get_item(
    Key={'user_id': '12345'},
    ConsistentRead=False  # Might return stale data
)

# Strongly consistent read (slower but fresh)
response = table.get_item(
    Key={'user_id': '12345'},
    ConsistentRead=True  # Latest value guaranteed
)

# Writes are always strongly consistent within DynamoDB's single region
table.put_item(Item={'user_id': '12345', 'balance': 500})

DynamoDB’s consistency dial is simple: reads are either eventual (fast) or strong (slower). All writes are durable within the region, but cross-region replication to other continents is eventual.

MongoDB: Read Concerns

// Local read: fastest, might be rolled back
db.collection('accounts').find({}).readConcern({level: 'local'});

// Majority read: waits for majority commit
db.collection('accounts').find({}).readConcern({level: 'majority'});

// Linearizable read: coordinates with writes
db.collection('accounts').find({}).readConcern({level: 'linearizable'});

Global Strong Consistency: The TrueTime Solution

Achieving strong consistency across geographic regions is notoriously hard. Google Spanner solves it with TrueTime, a hardware-assisted clock system:

Every data center has:

  • Atomic clocks (GPS + cesium)
  • Uncertainty bounds: the system knows its clock might be off by at most 7 milliseconds

When a transaction writes at time t1, Spanner waits until the local clock is at least t1 + 7ms before releasing the transaction result. This ensures that any subsequent read sees a timestamp after the write, eliminating read-write conflicts across regions.

The cost: unavoidable 7ms latency overhead on every transaction, plus the expense of atomic clocks and redundant GPS receivers.

CockroachDB uses a more practical approach with Hybrid Logical Clocks (HLC), combining logical and physical timestamps to approximate TrueTime without dedicated hardware.

The Stale Data Problem: A Timeline

Let’s visualize what happens with eventual consistency when a user updates their address:

timeline
    title Eventual Consistency: Address Update Timeline
    t0_00 : User updates address to "123 Main St"
    t0_10 : Node A acknowledges, stores update
    t0_12 : Node B receives update (replication)
    t0_15 : Node C receives update (replication)

    t0_05 : Another user queries address
    t0_05 : Node A returns "123 Main St" ✓
    t0_06 : Node B returns "456 Oak Ave" (stale)
    t0_07 : Node C returns "456 Oak Ave" (stale)

    t0_20 : All nodes eventually converge
    t0_20 : All nodes now have "123 Main St" ✓

In this scenario, queries arriving between t0.10 and t0.15 might hit Node B or C and receive stale data.

Application-Level Consistency

Sometimes databases don’t give you what you need, so you implement consistency at the application layer:

Read-Your-Writes Pattern

class UserService:
    def update_profile(self, user_id, new_data):
        # Write to the leader
        leader_node = self.get_leader()
        leader_node.write(user_id, new_data)

        # Store the write version
        session_version = self.get_last_write_version()

        # All subsequent reads in this session use this version
        self.set_session_version(session_version)
        return True

    def get_profile(self, user_id):
        # Route to a replica with at least our version
        min_version = self.get_session_version()
        replica = self.find_replica_with_version(user_id, min_version)
        return replica.read(user_id)

You track version vectors or timestamps and route subsequent reads to replicas that have seen your writes.

Trade-offs and Practical Decisions

FactorStrong ConsistencyEventual Consistency
Write LatencyHigh (waits for quorum)Low (single node)
Read LatencyMedium (any replica)Low (any replica, most recent copy)
AvailabilityLower (unavailable during partitions)Higher (always available)
Network CostHigher (quorum replication)Lower (can batch, compress)
ComplexitySimpler mental modelComplex conflict resolution
Failure ScenariosPredictableRequires anti-entropy repairs

Hybrid Approach: Most real systems don’t choose one or the other. Instead:

  • Critical paths (authentication, payment) use strong consistency
  • Non-critical reads (view counts, recommendations) use eventual consistency
  • Read-heavy workloads use bounded staleness with caching

Did you know? Facebook’s Memcache layer returns stale data intentionally during thundering herd situations. When a hot key expires and 1,000 servers request it simultaneously, some get stale results from cache rather than overwhelming the database.

Key Takeaways

  • Strong consistency guarantees linearizability through quorum-based writes but sacrifices latency and availability during partitions. Use it for state where correctness is non-negotiable.
  • Eventual consistency prioritizes availability and performance but introduces a convergence window where different clients see different values. Use it for non-critical reads.
  • Real systems live on the spectrum: read-your-writes, bounded staleness, and monotonic reads offer middle ground with lower costs than strict strong consistency.
  • Tunable consistency lets you choose per query or per table, enabling smart trade-offs within a single system.
  • Operational burden increases with eventual consistency: anti-entropy, conflict resolution, and version tracking add complexity that strong consistency avoids.
  • Global consistency is expensive: achieving strong consistency across regions requires either TrueTime hardware or accepting high latency. Most systems shard by geography and accept eventual consistency cross-region.

Practice Scenarios

Scenario 1: E-Commerce Inventory System

You’re designing the inventory system for an online retailer serving customers globally. Selling the same item from multiple data centers risks overselling—a dress in size M might be sold 5 times when only 3 units exist. However, your system must handle 10,000 writes per second and 100,000 reads per second.

What consistency model do you choose? What happens during a data center failure? How do you prevent overselling?

Scenario 2: Social Media Analytics

Your platform shows users “You have 52 followers” next to their profile. These follower counts are updated millions of times per day. However, users occasionally report seeing the count fluctuate (sometimes 52, sometimes 51).

Is this acceptable? What if instead of a count, it was “You have $52 in your account”—would that be acceptable? What’s the minimum consistency guarantee you need for each metric?

Scenario 3: Global Database Replication

Your application has users in US, EU, and Asia. You’re building a global user directory. Consistency across regions is challenging, but you need reads to be fast in every region (reading from local replicas). You can tolerate writes taking longer.

Design your replication strategy. What consistency level can you achieve across regions? How do you handle the scenario where a user updates their email in the US and immediately tries to log in from Asia?

Looking Ahead

We’ve examined the two extremes of consistency, but reality is more nuanced. In the next section, Causal and Session Consistency, we’ll explore intermediate guarantees that maintain ordering and causality without the full cost of strong consistency. These models let your distributed system scale while preserving the relationships between operations—a powerful sweet spot for many applications.