System Design Fundamentals

Causal & Session Consistency

A

Causal & Session Consistency

The Problem with Pure Eventual Consistency

Imagine you’re building a social media platform. Alice posts “I got the job!” with genuine excitement. Seconds later, Bob, who follows Alice closely, sees the post and replies “Congratulations!”

Now here’s the problem: with pure eventual consistency, some users might receive Bob’s reply before Alice’s post reaches their cache. A user could see “Congratulations!” floating in the feed with no context—which reply? To what? The feed becomes confusing and illogical.

This scenario reveals a critical gap in the consistency spectrum we explored earlier. Strong consistency is expensive (high latency), while eventual consistency is cheap but sometimes nonsensical. We need something smarter: a system that understands why Bob’s reply exists and automatically ensures it appears after Alice’s post, without requiring all nodes to agree on the exact timestamp of every operation.

That’s where causal consistency enters the picture.

Understanding Causality in Distributed Systems

Causal consistency is a middle ground that respects logical dependencies between operations while allowing independent operations to be reordered.

Here’s the formal rule: if operation A causally precedes operation B (A must happen before B makes sense), then every process observes A before B. However, operations that are concurrent—not causally related—may appear in any order on different nodes.

The mathematical foundation is the happens-before relationship, formalized by Leslie Lamport in 1978. Two events are related by happens-before (written A → B) if:

  1. A and B occur at the same process, and A occurs before B in program order
  2. A is a send operation and B is the corresponding receive operation
  3. There exists a chain of happen-before relationships connecting them

Conversely, if neither A → B nor B → A, the events are concurrent—no causal dependency exists.

Session consistency is a practical simplification: it’s a consistency guarantee that applies to a single client session. Within your session, you’re guaranteed:

  • Read-your-writes: Any write you perform is visible to your subsequent reads
  • Monotonic reads: If you read value V₁, then later read value V₂, you never see V₂ older than V₁
  • Monotonic writes: Your writes appear in the order you issued them, to all observers
  • Writes-follow-reads: If you read X, then write Y, observers see your read of X before your write of Y

For most applications, session consistency is “good enough.” You don’t need every cross-session causal dependency tracked globally—just your own user’s perspective to remain consistent.

A Conversation Model for Causality

Think of causal consistency like a threaded email conversation. When you reply to an email, that reply is causally dependent on the original. Anyone reading your mailbox must see the original before the reply—otherwise, “ROFL sounds amazing!” makes no sense without context.

However, your email thread about Project X and another thread about weekend plans are concurrent. They can arrive at different sites in different orders. Your mailbox is eventually consistent across servers for independent threads, but causally consistent within each thread.

Session consistency is even simpler: it’s your personal notebook. Whatever you wrote down, you can always read back in order. You might not see everything your colleagues wrote, and their edits might reach you in strange orders. But your own entries? Always consistent from your perspective.

Technical Mechanisms for Tracking Causality

Lamport Timestamps (Logical Clocks)

The simplest approach is Lamport timestamps: each process maintains a counter that increments on every local event, and receives a timestamp with each message. Upon receiving a message with timestamp T, the local counter becomes max(local_counter, T) + 1.

Process A: [1] send message
Process B: receives message, sets clock to max(2, 1) + 1 = 3

Limitation: Lamport timestamps can’t distinguish concurrent events. If A has timestamp 5 and B has timestamp 7, you know A → B happened. But if A has timestamp 5 and B has timestamp 5, Lamport timestamps can’t tell you whether they’re concurrent or not.

Vector Clocks

Vector clocks fix this limitation. Each process maintains one counter per process in the system. When a process generates an event, it increments its own counter. When sending a message, it includes the entire vector. Upon receiving a message, the process updates each component to max(local[i], received[i]), then increments its own.

Let me illustrate with a three-process system:

Process A: [1, 0, 0] → sends message
Process B: [0, 1, 0] → sends message
Process A receives B's message:
  A updates: [max(1,0), max(0,1), max(0,0)] + 1 = [1, 1, 0]

Now compare two events by their vectors:

function compareVectors(v1, v2) {
  let v1_less = false;
  let v1_greater = false;

  for (let i = 0; i < v1.length; i++) {
    if (v1[i] < v2[i]) v1_less = true;
    if (v1[i] > v2[i]) v1_greater = true;
  }

  if (v1_less && !v1_greater) return "V1 causally precedes V2";
  if (v1_greater && !v1_less) return "V2 causally precedes V1";
  if (v1_less && v1_greater) return "Concurrent";
  if (JSON.stringify(v1) === JSON.stringify(v2)) return "Same event";
}

Here’s a diagram of vector clock progression in a multi-node system:

graph TD
    subgraph Process_A["Process A"]
        A1["[1,0,0]<br/>Write X"]
        A2["[2,0,0]<br/>Read X"]
        A1 --> A2
    end

    subgraph Process_B["Process B"]
        B1["[0,1,0]<br/>Write Y"]
        B2["[1,2,0]<br/>Read X,Y"]
        B1 --> B2
    end

    subgraph Process_C["Process C"]
        C1["[0,0,1]<br/>Write Z"]
        C2["[1,2,2]<br/>Read All"]
        C1 --> C2
    end

    A1 -->|"message[1,0,0]"| B2
    A1 -->|"message[1,0,0]"| C2
    B1 -->|"message[0,1,0]"| C2

Scalability Challenge: Vector clocks require one counter per process. In a 100-node cluster, each event’s metadata includes a 100-integer vector. This overhead becomes problematic at scale.

Compact Representations

Dotted version vectors reduce the overhead by only tracking causality information that’s actually necessary. Instead of maintaining a vector for every event, systems like Riak use a more compact encoding where each entry records only the relevant causal history.

Interval trees can further compress metadata by grouping consecutive versions from the same process.

Instead of: [1, 5, 3, 8, 2, 6]
Represent: {A: [1-5], B: [3, 8], C: [2,6]}

How Real Systems Implement Causal Consistency

MongoDB’s Causal Sessions

MongoDB implements causal consistency through causal sessions. When you create a session, MongoDB tracks two pieces of metadata:

  • Cluster time: A logical clock that all nodes agree upon (advanced via heartbeats)
  • Operation time: The specific time of your last operation
const session = db.getMongo().startSession({
  causalConsistency: true
});

// Your writes are tagged with operation time
session.startTransaction();
db.users.updateOne(
  {_id: "alice"},
  {$set: {job: "Engineer"}},
  {session: session}
);
// Server adds operationTime to response

// Your subsequent reads must use afterClusterTime
const afterTime = session.getOperationTime();
db.posts.find(
  {author: "alice"},
  {session: session, readConcern: {level: "majority", afterClusterTime: afterTime}}
);

The afterClusterTime parameter ensures your reads reflect state at least as recent as your writes. Internally, MongoDB delays reads at nodes that haven’t yet replicated to that cluster time.

Azure Cosmos DB’s Session Tokens

Cosmos DB provides session consistency tokens returned with every operation:

const {resource, headers} = await container.items.create(
  {id: "doc1", value: "alice's post"}
);

const sessionToken = headers["x-ms-session-token"];
// Token format: "0:123456-1:50"
// (partition ID : LSN - region ID : LSN)

// Later, pass token to ensure read sees your write
const {resource} = await container.item("doc1").read(
  {sessionToken: sessionToken}
);

The token encodes the LSN (Log Sequence Number) for each partition and region. The read operation waits until those partitions have replicated to at least that LSN.

Implementing Session Consistency at Application Level

When your database doesn’t provide causal consistency natively, you can implement it yourself:

Read-Your-Writes Token Pattern

// Server-side: attach token to response
app.post("/posts", (req, res) => {
  const post = db.posts.insert({
    author: req.user.id,
    content: req.body.content,
    timestamp: Date.now()
  });

  // Create a token encoding the write's logical time
  const token = Buffer.from(
    JSON.stringify({
      version: post.version,
      timestamp: post._internal_timestamp
    })
  ).toString('base64');

  res.json({post, readToken: token});
});

// Client-side: send token with subsequent requests
app.get("/posts", (req, res) => {
  const readToken = req.query.readToken;

  if (readToken) {
    const {version, timestamp} = JSON.parse(
      Buffer.from(readToken, 'base64').toString()
    );

    // Connect to a replica that has replicated past this timestamp
    const connection = findReplicaWithVersion(version);
    const posts = connection.find({});
  } else {
    const posts = db.posts.find({});
  }

  res.json(posts);
});

Sticky Sessions with Write Affinity

For session consistency, you can route a user’s requests to the same server:

app.use((req, res, next) => {
  // Route to server that handled user's last write
  const serverId = req.session.affinityServer;

  if (serverId && serverId !== process.env.SERVER_ID) {
    return res.redirect(
      `https://server${serverId}.example.com${req.originalUrl}`
    );
  }

  next();
});

app.post("/action", (req, res) => {
  // Remember this server for future writes
  req.session.affinityServer = process.env.SERVER_ID;

  const result = db.execute(req.body);
  res.json(result);
});

This guarantees the server handling your writes has its own data up-to-date before serving your reads.

A Concrete Example: Social Media Timeline Ordering

Let’s trace through why causal consistency matters:

Scenario: Alice writes Post P, then comments “Check my pinned post” on Bob’s wall. Without causal consistency:

  • Node 1 sees: Bob’s wall with Alice’s comment, but not Post P yet (confusing)
  • Node 2 sees: Post P, but not the comment (incomplete context)

With causal consistency:

  1. Alice’s write of P generates a timestamp [A:1]
  2. Alice’s comment includes dependency [A:1]
  3. Any node that replicates the comment automatically replicates P (or waits)
  4. All nodes see P before the comment

The system understands: “comment depends on post P, so serve them in order.”

Causal Consistency vs. Strong Consistency

AspectCausalStrong
LatencyLow (only waits for dependent ops)High (waits for all replicas)
ComplexityModerate (vector clocks, causality tracking)Low (simple total ordering)
Application fitFeeds, comments, collaborative editingBanking, atomic transfers
Metadata overheadPer-operation vectorsPer-operation timestamps
Concurrent operationsCan reorder freelyNever reorder

Pro Tip: Causal consistency is particularly valuable in geographically distributed systems. You can serve reads from nearby regions while only synchronizing causal dependencies, avoiding the cross-continent latency hit of strong consistency.

Causal Consistency vs. Eventual Consistency

AspectCausalEventual
User-facing guaranteesCausally consistent feedsEventually see everything
Cognitive loadHigh (must understand causality)Low (simple “eventually”)
Implementation costModerate (vector clocks)Low (very lightweight)
AnomaliesReply before post (impossible)Delayed visibility, out-of-order items
Recovery timeDepends on causal propagationArbitrary (minutes to hours)

When Session Consistency is Enough

Most single-player or single-session workflows only need session consistency, not full causal consistency:

  • Editing your own document
  • Updating your profile
  • Reading your own messages
  • Writing and reading in a single thread

The key insight: you don’t need every other user’s operations to be causally consistent with yours. You only need your own operations to be consistent with each other.

When You Need Full Causal Consistency

Cross-session dependencies require causal consistency:

  • Collaborative document editing (Alice edits line 5, Bob must see that before editing line 6)
  • Social feeds with comments (replies must see original posts)
  • Distributed transactions with dependencies
  • Version control systems with branching and merging

Tracking Overhead and Scalability Limits

Here’s the real cost of causal consistency:

System SizeVector Clock BytesNetwork OverheadComparison Ops
10 nodes4040 bytes/msg10 comparisons
100 nodes400400 bytes/msg100 comparisons
1,000 nodes4,0004,000 bytes/msg1,000 comparisons

At 1,000 nodes, you’re adding 4KB of metadata to every message. This compounds quickly in high-throughput systems.

Did you know? Research systems like COPS (Clusters of Order-Preserving Servers) addressed this by only tracking causality within a cluster, not across the entire datacenter. Users connected to one cluster see causal consistency; cross-cluster reads eventually converge. This reduces overhead while maintaining consistency where users actually perceive it.

Key Takeaways

  • Causal consistency bridges the gap between eventual and strong consistency by respecting logical dependencies while allowing independent operations to reorder
  • Vector clocks enable tracking by maintaining one counter per process; compare vectors to determine happens-before relationships
  • Session consistency covers most single-user needs (read-your-writes, monotonic reads/writes) at much lower cost than full causal consistency
  • Metadata overhead scales linearly with node count; at scale, compact representations like dotted version vectors become essential
  • Application-level implementation (tokens, sticky sessions) can provide session consistency when databases don’t support it natively
  • Choose your consistency model based on actual causality dependencies in your application, not on theoretical purity

Practice Scenarios

Scenario 1: Distributed Messaging System Users can post messages and react with emojis to others’ messages. Design consistency guarantees so:

  • A user always sees their own reactions reflect in their reads
  • A reaction never appears before the message it reacts to
  • Two independent messages can appear in any order

What consistency model fits? What metadata must you track?

Scenario 2: Multi-Datacenter Collaborative Editor A Google Docs-like editor runs across three geographic regions. Alice in US-East edits a paragraph, then Bob in EU edits the next paragraph. Later, Charlie in AP-Southeast reads the document. Charlie must see Alice’s edit before Bob’s (because Bob depends on seeing Alice’s version).

How would you use vector clocks? What happens if you used Lamport timestamps instead?

Scenario 3: E-Commerce Order + Notification A customer places an order (writes to Orders table), which triggers a notification (writes to Notifications table). The notification service reads the order to generate the message. Eventual consistency creates a race where the notification reads before the order exists.

Implement session consistency using tokens to guarantee the notification sees the order.

Connection to Distributed System Trade-offs

Causal and session consistency are tools in a larger toolkit. You’ve now explored the consistency spectrum from eventual (cheap, fast) to strong (expensive, slow), with causal (balanced) in the middle.

The next section dives into conflict resolution strategies for systems that can’t afford strong consistency. When different nodes have diverging state, how do you reconcile them? Causal consistency can prevent some conflicts (like replies before posts), but concurrent writes still clash. Understanding CRDT algorithms and operational transformation will help you design systems that tolerate inconsistency without sacrificing correctness.