System Design Fundamentals

Distributed Caching

A

Distributed Caching

The Limits of a Single Cache

You’ve built a solid caching layer with a single Redis instance. Your application flies. Latency drops. Databases breathe easier. But then you hit the wall: your working set grows to 500 GB, and your server only has 256 GB of RAM. Or worse, your single cache server goes down on a Saturday night, and everything cascades into failure. This is where distributed caching enters the picture.

A single cache instance works beautifully at small scale, but the real world doesn’t care about beautiful architecture when you’re burning through memory or facing the inevitable hardware failure. Distributed caching splits your cache across multiple machines, letting you scale horizontally instead of vertically. It’s the bridge between the caching fundamentals we covered earlier and the scalability patterns from Chapter 4—now we’re applying those distributed systems principles to the cache layer itself.

This chapter ties together everything you’ve learned: the cache-aside pattern, TTL strategies, and cache invalidation techniques now become tools you use within a distributed cache system. The question shifts from “how do we use caching?” to “how do we build a cache system that can scale to millions of requests across hundreds of servers?”

Partitioning, Hashing, and High Availability

Distributed caching rests on three pillars: partitioning (splitting data across machines), consistent hashing (knowing which machine holds which key), and replication (ensuring your cache survives failures).

When you have 100 servers to work with, you need a deterministic way to decide which server stores which key. The naive approach—modulo hashing—breaks the moment you add or remove a server. A key that hashed to server 5 might suddenly hash to server 12. This forces a complete cache rebuild, which defeats the purpose. Consistent hashing solves this elegantly. Instead of mapping keys directly to servers, you map them to positions on a hash ring. Servers also occupy positions on the ring. A key’s data lives on the nearest server clockwise from its position. Add or remove a server, and only a small fraction of keys need to relocate—typically around K/N keys, where K is total keys and N is the number of servers.

Sharding (or partitioning) divides your data across servers. With client-side sharding, your application logic decides which server gets which key. With server-side sharding, a cluster of coordinating servers makes those decisions for you. The difference matters: server-side adds complexity but simplifies client logic and allows dynamic resharding. Redis Cluster uses server-side sharding with 16,384 hash slots distributed across nodes. Memcached typically relies on client-side sharding through libraries that implement consistent hashing.

Replication keeps your cache alive when hardware fails. A primary node holds the authoritative data; replica nodes hold copies. When the primary fails, a replica can be promoted. Redis Sentinel automates this failover process, continuously monitoring cluster health and promoting replicas without manual intervention. Memcached takes a simpler approach—it’s “fire and forget.” If a Memcached instance dies, you lose that shard’s data. Many large deployments accept this trade-off for simplicity, relying on cache-aside semantics to repopulate from the database.

The choice between Redis Sentinel (explicit replication with failover) and Redis Cluster (partitioning with replication built in) depends on your scale. Sentinel works well up to dozens of nodes. Cluster handles thousands. Memcached remains stateless and distributed from the start, making it simpler operationally but less flexible for complex operations.

A Chain of Warehouses

Imagine running a retail operation across the entire country. Instead of a single central warehouse, you build regional distribution centers—one on the East Coast, one in the Midwest, one out West. When a store in Boston needs inventory, it doesn’t wait for a truck from California. It pulls from the nearest warehouse. Consistent hashing is the inventory management system: a product’s barcode mathematically determines which warehouse stocks it. The system is designed so that when you open a new warehouse, only a portion of products need to move; most stay where they are.

If a warehouse catches fire, customers don’t lose access to everything—redundant stock at other warehouses keeps the supply chain running. Scale that metaphor to millions of cache requests per second across a global platform, and you’ve captured distributed caching’s essence.

How Consistent Hashing Works

Let’s ground this in mechanics. Imagine a circular ring numbered 0 to 2^32 - 1. Every key gets hashed to a position on this ring. Every cache server also gets hashed to positions on the ring (usually several positions, called “virtual nodes,” to balance load). To find where key user:5678 lives, you hash it, find its position on the ring, and walk clockwise until you hit a server node. That’s your destination.

Here’s the elegance: add a new server? Nodes clockwise from it stop handling certain ranges and defer to the new server. Only the keys in that wedge get remapped. Remove a server? Its keys redistribute to the next server clockwise. Minimal disruption.

Consistent Hash Ring Example:

          ┌─────────────────┐
       ╱───    Server A    ───╲
      │  key:9832            │
      │  key:4421            │
      │         key:1203    │
      │    key:5567        │
      │ key:6789 Server D  │
      │         ╱─────────  │
      └────────────────────────┘
          ╱       │        ╲
         │        │         │
      key:111  Server B   key:2345
               key:8765

When Server C joins:
- Key:1203 and Key:5567 migrate from D to C
- All other keys stay put

Redis Cluster implements this with 16,384 hash slots instead of a continuous ring. Each slot maps to a server; cluster nodes coordinate slot assignment. When a node joins or fails, the cluster rebalances slots across remaining healthy nodes.

Redis Cluster Configuration Example:

Master Node 1: Slots 0-5461
  └─ Replica Node 1 (standby)

Master Node 2: Slots 5462-10922
  └─ Replica Node 2 (standby)

Master Node 3: Slots 10923-16383
  └─ Replica Node 3 (standby)

When Master Node 1 fails, Replica Node 1 is promoted. The cluster automatically updates routing tables. Clients connect to any node, get redirected if they hit the wrong one, and their connection auto-updates to the correct node.

Here’s a comparison of the two elephants in the room:

FeatureRedis SentinelRedis ClusterMemcached
PartitioningNo (single instance per sentinel group)Yes (16,384 slots)Client-side only
High AvailabilityYes (automatic failover)Yes (replica promotion)No (fire-and-forget)
ScalabilityVertical + read replicasHorizontalHorizontal
Operational ComplexityLow to MediumMedium to HighLow
Consistency GuaranteesStrong (within limits)Eventual (with resharding)Eventual (always)
Replication LagMinimalDuring reshardingN/A

Implementing Distributed Cache Patterns

Setting up Redis Cluster locally for testing takes minutes. You need multiple Redis instances and a cluster initialization command:

# Start 6 Redis instances (3 masters, 3 replicas)
for port in 7000 7001 7002 7003 7004 7005; do
  redis-server --port $port --cluster-enabled yes \
    --cluster-config-file nodes-${port}.conf --cluster-node-timeout 5000 &
done

# Create the cluster
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 \
  127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
  --cluster-replicas 1

Now your application connects to any node (typically through a client library that handles redirection):

from redis.cluster import RedisCluster

# Client automatically discovers cluster topology
cache = RedisCluster(
    startup_nodes=[{"host": "127.0.0.1", "port": 7000}],
    decode_responses=True
)

# Use like normal Redis—client handles partitioning
cache.set("user:5678", json.dumps(user_data))
cache.get("user:5678")

For consistent hashing with client-side partitioning (say, using Memcached or a custom solution):

from cachetools import TTLCache
from consistent_hash import ConsistentHash

servers = ["cache-1.internal:11211",
           "cache-2.internal:11211",
           "cache-3.internal:11211"]

ch = ConsistentHash(nodes=servers)

def cache_get(key):
    server = ch.get_node(key)
    return memcached_client[server].get(key)

def cache_set(key, value, ttl=3600):
    server = ch.get_node(key)
    memcached_client[server].set(key, value, ttl)

Large-scale examples: Facebook/Meta runs Memcached at enormous scale—billions of objects across thousands of servers. They accept data loss on server failure because the database is the source of truth. Twitter leans heavily on Redis Cluster for partitioned, replicated caches. LinkedIn runs both: Memcached for simple values and caches they can lose, Redis for structured data requiring replication.

The Trade-Offs You Can’t Ignore

Distributed caching trades simplicity for scale. A single Redis instance is straightforward: you have one machine, one configuration, one failure mode. Distributed caching introduces complexity—cluster coordination, network partitions, resharding operations. When your cluster reshards (moving data between nodes), there’s a window where consistency is murky. Consistency models weaken compared to a single instance.

Network overhead increases too. Each cache operation now crosses the network to a remote server (or multiple servers if you count replication). Latency isn’t zero anymore. You gain throughput and capacity at the cost of individual request latency.

Operationally, you need monitoring, alerting, and runbooks for failures you didn’t face before. A node goes down? The cluster handles it, but you’re bleeding to disk I/O as the database answers cache-miss queries. Memory capacity decisions become trickier—you can’t just buy one bigger server; you’re buying many smaller ones and distributing load.

When is a single cache enough? If your working set fits comfortably in a single server’s RAM, failures are rare (or you accept them), and your QPS is under a few hundred thousand, stick with a single instance backed by Sentinel for HA. Don’t overcomplicate prematurely.

Key Takeaways

  • Distributed caching scales horizontally by partitioning data across multiple servers using consistent hashing, minimizing key redistribution when nodes change
  • Consistent hashing maps keys and servers to positions on a ring; only keys between the old and new server position need to move when topology changes
  • Redis Cluster (server-side sharding with 16,384 slots) and Redis Sentinel (single instance with replication/failover) serve different use cases; Cluster for horizontal scale, Sentinel for HA at modest sizes
  • Memcached favors simplicity over durability, relying on client-side consistent hashing and accepting data loss on failure
  • Replication ensures availability: primary nodes hold data, replicas act as standby, automatic promotion handles failures
  • Distributed caching trades simplicity for scale; single-instance caches with Sentinel HA are sufficient until working set or QPS demands otherwise

Practice Scenarios

Scenario 1: Handling a New Cache Node Your Redis Cluster is running 3 masters with 5 million keys. You add a fourth master to handle increased load. Walk through: (a) which keys move, (b) how long the resharding takes, (c) what happens to requests during resharding, and (d) how client libraries handle this transparently.

Scenario 2: Cache Replication Strategy Your Memcached deployment has no replication. Every hour, a random server dies and comes back. How might you redesign using Redis Cluster with replication to eliminate cache miss spikes? What’s the trade-off in complexity vs reliability?

Scenario 3: Debugging a Consistency Bug A user’s profile updates in the database, gets cached, but a second request from another service reads stale data from a replica that hasn’t yet synced the write. Trace the issue and propose a solution (read-your-writes consistency, versioning, explicit invalidation, etc.).

Looking Ahead

You’ve now mastered caching at every layer—from local in-process caches to distributed systems spanning continents. In the next chapter, we shift focus to databases and persistence: how distributed systems store data durably, why caching alone isn’t enough, and how databases themselves employ the scalability patterns you’ve learned. You’ll discover that databases and caches are not islands—they’re deeply intertwined in modern architecture.