System Design Fundamentals

Choosing a Consistency Model

A

Choosing a Consistency Model

Introduction

Imagine you’re designing an application that needs to handle multiple features simultaneously: user profiles, a real-time chat system, payment processing, a newsfeed, and analytics. Each of these features has dramatically different consistency requirements. When you update your profile picture, you need to see that change reflected immediately — read-your-writes consistency is essential. But in the chat feature, if messages arrive slightly out of order due to network delays, users can still figure out the conversation flow — causal consistency is good enough. For payments, you absolutely cannot afford inconsistency; a customer’s money must be deducted atomically or not at all — strong consistency is non-negotiable. Your newsfeed can show content that’s a few seconds old, and users won’t care — eventual consistency works perfectly. And analytics? Historical data a few minutes stale is completely acceptable.

The question isn’t “which consistency model is best” — it’s “which consistency model is right for this specific use case?” In this chapter, we’ve explored strong consistency, weak consistency, eventual consistency, causal consistency, and session consistency. Now we need to learn how to systematically choose among them. This decision fundamentally impacts your system’s latency, throughput, availability, cost, and complexity. Get it right, and users have a fast, responsive experience. Get it wrong, and you’ll either over-engineer performance (wasting resources) or under-engineer correctness (creating bugs).

The Decision Framework

Choosing a consistency model requires analyzing several interconnected factors. Let’s work through each systematically.

Data Criticality

The first question is simple: what happens if this data is wrong? For financial transactions, inventory counts, and authentication tokens, wrongness is catastrophic. Users lose money. Customers receive items they didn’t pay for. Attackers gain access to systems. These domains demand strong consistency.

Compare that to a like counter on a social media post. If it shows 999 likes when the actual count is 1001, did anyone suffer? No. The data can eventually become correct, and that’s acceptable.

Between these extremes lie many systems where wrongness is expensive but not catastrophic. If a product catalog is stale by 5 minutes, customers might see slightly outdated prices or availability. It’s not ideal, but the impact is manageable. These domains are candidates for weaker consistency models.

User Experience Impact

Related to criticality, but distinct: will users notice inconsistency? Read-your-writes consistency exists because users expect that when they perform an action, they immediately see the result. If you update your email address and then reload your profile page, you expect to see the new email. If you don’t, it feels broken — even if the data will eventually be consistent.

In contrast, if a recommendation algorithm takes 30 seconds to reflect your new interests, users don’t notice. The impact on perceived quality is minimal.

For multiplayer games, causal consistency is critical — players expect that if Player A shoots first and Player B shoots second, everyone sees that order. Strong consistency might not be necessary, but causal ordering is.

Regulatory and Compliance Requirements

Some domains have no choice in consistency models. Financial regulations (PCI-DSS, SOX, GDPR) often mandate strong consistency for transaction logs and audit trails. Healthcare systems (HIPAA compliance) require strong consistency for patient records. Securities trading (SEC regulations) demands strong consistency for order books and trade logs.

These aren’t suggestions — they’re legal requirements. If you’re processing payments, you must use strong consistency, regardless of the performance cost. If you’re managing healthcare data, you must ensure ACID properties, even if eventual consistency would be faster.

Read-to-Write Ratio

This factor significantly influences which consistency model is practical. If your system has a 100:1 read-to-write ratio (like a newsfeed or product catalog), even the cost of strong consistency on writes might be acceptable because writes are rare. The overhead is amortized.

But if your system has a 1:10 write-to-read ratio (like a trading system where every order is immediately queried), strong consistency becomes very expensive. Every write must block reads until consistency is achieved. This impacts latency severely.

Systems with high read ratios are excellent candidates for eventual or causal consistency — reads are cheap and can be served from local replicas, while writes eventually propagate.

Geographic Distribution

If your data is concentrated in one datacenter, strong consistency is relatively easy — messages travel at most a few milliseconds, and consensus algorithms perform well. But if your users are distributed globally, strong consistency becomes increasingly painful. A user in Singapore performing a write must wait for acknowledgment from nodes in Virginia and Frankfurt. Network latency alone can exceed 200 milliseconds, making the system feel slow.

Geographically distributed systems almost universally use eventual consistency, where local writes are fast and replicas catch up asynchronously.

Latency Requirements

Some applications are latency-sensitive: mobile apps, web browsers, real-time games, autonomous vehicles. For these systems, strong consistency might be incompatible with latency requirements. A 500-millisecond latency to achieve strong consistency is unacceptable for a mobile game where players expect sub-100-millisecond response times.

Other applications have flexible latency budgets. A batch analytics job that takes 30 seconds instead of 5 seconds is fine.

Conflict Resolution Complexity

What happens when replicas diverge? For some data structures, conflict resolution is simple. A timestamp-based last-write-wins strategy works for user preferences. But for collaborative documents, spreadsheets, or concurrent edits to the same record, conflict resolution is complex. You might need operational transformation or conflict-free replicated data types (CRDTs).

Systems where conflict resolution is complex and important (like collaborative editing) benefit from causal consistency, which prevents certain classes of conflicts. Systems where conflicts are rare or unimportant (like analytics aggregation) can tolerate eventual consistency.

The Consistency Decision Matrix

Here’s how these factors map to consistency models:

Use Case CategoryData CriticalityUser ImpactRegulatoryRead/Write RatioRecommended Model
Financial transactionsCriticalHighRequiredMixedStrong consistency
Inventory (e-commerce)HighHighOften requiredReads greater than writesStrong consistency
User authenticationCriticalHighRequiredReads much greater than writesStrong consistency
Product catalogMediumMediumNoReads much greater than writesEventual consistency
Social media likes/countsLowLowNoReads much greater than writesEventual consistency
Chat messagesLowHighNoBalancedCausal consistency
User preferencesLowMediumNoReads greater than writesSession consistency
Newsfeed postsLowMediumNoReads much greater than writesEventual consistency
LeaderboardsLowMediumNoReads much greater than writesEventual consistency
Analytics/logsLowLowSometimesReads much greater than writesEventual consistency

This matrix isn’t a prescription — it’s a starting point. Your specific system might differ based on factors we’ll discuss next.

An Insurance Analogy

Choosing consistency is like choosing insurance coverage. For your most valuable assets (financial accounts, health records, payment transactions), you purchase comprehensive coverage (strong consistency). The cost is significant, but the asset justifies it.

For everyday items (social media engagement metrics, analytics, recommendations), you purchase basic coverage (eventual consistency). The risk of loss is acceptable compared to the cost of premium coverage.

For items of moderate value (chat messages, user preferences), you purchase mid-tier coverage (session or causal consistency). You get meaningful protection without the full cost of comprehensive coverage.

You wouldn’t insure every item at the highest level — the premiums would bankrupt you. Similarly, you can’t use strong consistency for every operation in your system. The latency cost, infrastructure complexity, and operational overhead would make the system impractical.

Consistency in Real Domains

Let’s examine how different domains actually choose consistency models:

Financial Systems

Strong consistency — non-negotiable. When you transfer money from one account to another, the system must guarantee that either the transfer completes atomically or it doesn’t. Partial transfers are unacceptable. Payment systems use distributed transactions (traditionally two-phase commit, increasingly Saga patterns) to guarantee consistency even across multiple services.

Banks prioritize consistency over availability and latency. If a consistency check fails, a transaction is rejected rather than risking inconsistency.

E-Commerce Inventory

This domain is more nuanced. Inventory counts that determine whether an item can be purchased must be strongly consistent. When a customer checks out with the last item in stock, the system must ensure no other customer can simultaneously purchase it. Race conditions here lead to overselling — a business nightmare.

However, catalog browsing can use eventual consistency. If a product’s price or description is stale by a few seconds, users won’t notice. The inventory service enforces strong consistency through distributed locks or single-node bottlenecks, while the catalog service can replicate read-only copies asynchronously.

Social Media

Different features have different requirements:

  • Posts and replies: Causal consistency. If you reply to a comment, everyone should see your reply ordered after the original comment. But if your post takes 2 seconds to appear in your followers’ feeds, that’s acceptable.
  • Likes and engagement metrics: Eventual consistency. A like counter showing 999 instead of 1001 is fine.
  • Your own content: Session consistency. When you post something, you should immediately see it in your feed. But other users might see it appear a moment later.
  • Direct messages: Causal consistency. Message ordering matters for comprehension.

Gaming Leaderboards

Leaderboards are an interesting case: they’re eventually consistent operationally but require periodic strong consistency checks. A player’s score updates are written asynchronously to replicas — eventual consistency. But periodically (weekly, monthly), the system performs a reconciliation pass to ensure all replicas agree on the final standings, then publishes the results with strong consistency.

IoT and Telemetry

Eventual consistency with last-write-wins. Sensor data flowing from millions of devices doesn’t require strong consistency. A temperature reading that’s 30 seconds old in the database doesn’t meaningfully change decision-making. Last-write-wins semantics work well because newer data is nearly always better than older data.

The tradeoff is that you might lose data in network partitions (if a sensor’s update doesn’t reach the primary before failure). For many telemetry systems, this is acceptable. You care about trends, not individual data points.

Collaborative Editing

Causal consistency with operational transformation or CRDTs. When Alice edits a document while Bob edits the same document simultaneously, the system must preserve both edits in a meaningful way. Strong consistency would lock out one editor while the other works. Eventual consistency could leave conflicting changes that are hard to reconcile.

Causal consistency with conflict-free replicated data types (CRDTs) allows local edits to proceed immediately while ensuring all edits are eventually incorporated in a consistent order that respects causality.

Configuration Management

Strong consistency. Systems like Kubernetes, Consul, and ZooKeeper use strong consistency for configuration because stale configuration can break application behavior. When you update a database connection string, you need absolute assurance that all services read the new value, not a cached old value.

Mixed Consistency in a Single Application

Here’s the key insight: you don’t have to choose one consistency model for your entire application. Instead, you can use different models for different features, even within a single database.

The Consistency Boundary Pattern

A consistency boundary is a logical region of your system where you maintain a specific consistency level. Different boundaries can use different levels.

┌─────────────────────────────────────────────────────┐
│                   Your Application                   │
├─────────────────────────────────────────────────────┤
│                                                       │
│  ┌─────────────────┐   ┌─────────────────────────┐  │
│  │ Strong Boundary │   │  Eventual Boundary      │  │
│  │ (Transactions)  │   │ (Analytics/Cache)       │  │
│  │                 │   │                         │  │
│  │ ├─ Orders       │   │ ├─ Pageviews            │  │
│  │ ├─ Payments     │   │ ├─ Engagement metrics   │  │
│  │ ├─ Inventory    │   │ ├─ Recommendations      │  │
│  │                 │   │ ├─ Search indexes       │  │
│  │ (Cassandra with │   │ (Cassandra with        │  │
│  │  QUORUM reads/  │   │  ONE read consistency) │  │
│  │  writes)        │   │                         │  │
│  └─────────────────┘   └─────────────────────────┘  │
│                                                       │
└─────────────────────────────────────────────────────┘

Within each boundary, you enforce your chosen consistency level. Between boundaries, you accept eventual consistency as changes flow from one to another.

Implementing Mixed Consistency in Cassandra

Cassandra gives you consistency control per query:

# Strong consistency for financial operations
from cassandra.cluster import ConsistencyLevel

# Transactions - strong consistency
session.default_consistency_level = ConsistencyLevel.QUORUM
result = session.execute(
    "UPDATE accounts SET balance = ? WHERE account_id = ?",
    [new_balance, account_id]
)

# Analytics - eventual consistency
session.default_consistency_level = ConsistencyLevel.ONE
result = session.execute(
    "SELECT * FROM analytics_events WHERE event_type = ?",
    [event_type]
)

# User preferences - session consistency
session.default_consistency_level = ConsistencyLevel.LOCAL_ONE
result = session.execute(
    "SELECT preferences FROM user_prefs WHERE user_id = ?",
    [user_id]
)

This approach lets you achieve strong consistency where needed while maintaining performance where eventual consistency is acceptable.

Microservices Consistency Boundaries

In a microservices architecture, consistency boundaries often align with service boundaries:

# Transaction Service - strong consistency
class TransactionService:
    def __init__(self, db_client):
        self.db_client = db_client
        self.consistency = ConsistencyLevel.ALL  # Strong

    def process_payment(self, order_id, amount):
        # This operation requires strong consistency
        return self.db_client.execute_with_consistency(
            "INSERT INTO transactions ...",
            self.consistency
        )

# Analytics Service - eventual consistency
class AnalyticsService:
    def __init__(self, event_log):
        self.event_log = event_log
        self.consistency = ConsistencyLevel.ONE  # Eventual

    def record_event(self, event_data):
        # This can be eventually consistent
        return self.event_log.append_with_consistency(
            event_data,
            self.consistency
        )

# User Profile Service - session consistency
class ProfileService:
    def __init__(self, user_store):
        self.user_store = user_store
        self.consistency = ConsistencyLevel.LOCAL_QUORUM

    def update_profile(self, user_id, updates):
        # User sees their own updates immediately
        return self.user_store.update_with_consistency(
            user_id,
            updates,
            self.consistency
        )

A Decision Flowchart

When you encounter a new use case, ask these questions in order:

graph TD
    A["Is this financially<br/>sensitive data?"] -->|Yes| B["Are there regulatory<br/>requirements?"]
    A -->|No| C["Will users notice<br/>stale data?"]

    B -->|Yes| D["Strong Consistency"]
    B -->|No| E["Can stale data<br/>cause conflicts?"]

    C -->|Yes| F["Do you need strict<br/>ordering?"]
    C -->|No| G["Read-to-write<br/>ratio > 10:1?"]

    E -->|Yes| H["Causal Consistency"]
    E -->|No| I["Strong Consistency"]

    F -->|Yes| J["Causal Consistency"]
    F -->|No| K["Session Consistency"]

    G -->|Yes| L["Eventual Consistency"]
    G -->|No| M["Session Consistency"]

Trade-offs and Pitfalls

Over-Engineering Consistency

The most common mistake is using strong consistency everywhere. Yes, it’s safest, but the cost is substantial. Every write must wait for multiple nodes to acknowledge. Latency increases. Throughput decreases. Infrastructure costs rise. Availability suffers (if a required quorum node is down, the system can’t write).

For a social media company, forcing strong consistency on all operations would make the system unusably slow and expensive. Eventual consistency for engagement metrics is the right choice.

Under-Engineering Consistency

Conversely, using eventual consistency for financial data is catastrophic. You’ll face data loss, duplicate transactions, overselling, and loss of customer trust. The short-term performance gain isn’t worth the long-term damage.

The Migration Challenge

Changing consistency models after launch is surprisingly difficult. If you shipped with eventual consistency for inventory and later discover this causes overselling, migrating to strong consistency requires:

  1. Stopping writes to rebuild state with strong consistency semantics
  2. Migrating historical data
  3. Updating application code to handle the new consistency level
  4. Extensive testing to catch edge cases

This is expensive and risky. Get the consistency model right at design time when possible.

Team Skill Requirements

Different consistency models require different expertise. Strong consistency with distributed transactions demands deep knowledge of consensus algorithms and failure scenarios. Eventual consistency requires understanding eventual convergence, conflict resolution, and monitoring for divergence.

Building a team capable of managing mixed consistency requires training and careful documentation.

Monitoring and Alerting

How do you know if your consistency is working? For strong consistency, you need alerts if consensus fails. For eventual consistency, you need alerts if replicas diverge beyond expected thresholds.

Cassandra can be monitored for consistency issues:

# Monitor for consistency violations
def check_consistency_health(session):
    # Query the same data from different nodes
    result1 = session.execute_with_node(...same query...)
    result2 = session.execute_with_node(...same query on different node...)

    if result1 != result2:
        alert("Consistency violation detected!")
        return False
    return True

Testing Distributed Consistency

Testing consistency is hard. Normal testing doesn’t reveal consistency bugs because everything usually works. You need to inject failures: network partitions, node crashes, clock skew.

The Jepsen testing framework, created by Kyle Kingsbury, is the gold standard for this. It systematically tests databases under various failure modes and verifies whether they maintain their consistency claims. Many database projects run Jepsen tests before major releases.

Key Takeaways

  • Consistency is not binary. You have multiple consistency models available, and different parts of your system can use different models.
  • Map your use case to requirements first. Data criticality, user impact, regulatory requirements, and read/write ratios all influence the right choice.
  • Strong consistency has real costs. It increases latency, reduces availability, and complicates architecture. Use it only where necessary.
  • Eventual consistency is often acceptable. For many domains, stale data for a few seconds is unnoticeable and unproblematic.
  • Mixed consistency in a single application is practical. Use consistency boundaries to apply different levels to different features.
  • Testing and monitoring are critical. Consistency bugs are subtle and difficult to reproduce in production.

Practice Scenarios

Scenario 1: Design a Ride-Sharing Application

You’re building a ride-sharing platform similar to Uber. Consider these features:

  • Rider location updates — riders broadcast their location to drivers in real-time
  • Driver availability — drivers signal when they’re available/unavailable
  • Ride requests and matching — a rider requests a ride, and the system matches them with a nearby driver
  • Trip history and ratings — after a ride, the rider and driver rate each other
  • Payment processing — the system charges the rider’s payment method

For each feature, decide:

  1. Which consistency model is appropriate?
  2. What’s the impact if this data becomes inconsistent?
  3. What’s the read-to-write ratio?
  4. Are there regulatory requirements?

Then design the database architecture showing consistency boundaries.

Scenario 2: Design a Multi-Tenant SaaS Analytics Platform

You’re building a platform where companies upload data and query analytics. Features include:

  • Data ingestion — customers upload CSV/JSON files
  • Schema definition — customers define what columns mean
  • Queries — customers run SQL queries on their data
  • Sharing — customers share reports with colleagues
  • Alerts — customers set up alerts when metrics exceed thresholds

Each customer’s data is isolated (multi-tenant). Decide consistency models for:

  1. Within a single customer’s data
  2. Across customers’ shared reports
  3. The metadata layer (schema definitions, customer accounts)

What happens if the metadata becomes inconsistent? What if one customer’s data becomes inconsistent?

Scenario 3: Redesign a Failing Social Media Platform

You’re taking over a social media platform that’s struggling with performance. The engineering team current uses strong consistency for everything — all features block until all replicas confirm. The system is slow and expensive.

Analyze these features and propose consistency model changes:

  • Posts and comments — currently strong consistency
  • Likes — currently strong consistency
  • Comments on your own posts — currently strong consistency
  • User feed — currently strong consistency
  • Notifications — currently strong consistency
  • User profiles — currently strong consistency

For each, explain why you’d keep or change the consistency level. What’s the expected impact on latency, throughput, and cost? What new challenges emerge with weaker consistency?

Looking Ahead

Understanding consistency models prepares us for the next challenge: data partitioning and sharding. When your dataset grows beyond a single node, you must split it across multiple servers. This introduces new consistency challenges — your data is physically separated, and you must ensure consistency despite data living in different places. In Chapter 13, we’ll explore how partitioning strategies interact with consistency models, and you’ll discover why some consistency choices make certain partitioning strategies impractical. The consistency foundations you’ve built here become essential tools for designing systems at scale.