System Design Fundamentals

Sharding Strategies

A

Horizontal Partitioning (Sharding) Strategies

Once you’ve committed to sharding, the next critical decision is how to determine which shard owns which data. Different strategies distribute data with varying characteristics: some excel at balancing load, others simplify rebalancing, and some optimize for specific query patterns. This section explores the most common sharding strategies, their mechanisms, trade-offs, and failure modes.

Range-Based Sharding

Range-based sharding assigns data to shards based on ranges of the partition key. For example, all users with IDs from 1 to 1M go to Shard A, 1M to 2M go to Shard B, and so on.

Mechanism

A shard mapping function is defined as:

shard_id = determine_shard(partition_key_value)

For range-based sharding:

user_id = 12345678
shard_id = user_id / 1000000  // Result: 12 (user belongs to Shard 12)

The mapping is deterministic: the same user ID always maps to the same shard. This simplicity is a major advantage.

Advantages

Simple to understand and implement: No complex hashing algorithms required. Ranges are intuitive.

Efficient range queries: Queries like “fetch all users with ID between 100K and 200K” map to a single shard, making them fast.

Ordered data access: Range-based shards preserve ordering of the partition key, enabling efficient cursor-based pagination.

Straightforward shard identification: Given a partition key value, you immediately know which shard to query.

Disadvantages

Hot shards: If the partition key is not uniformly distributed, some ranges will receive far more traffic than others. For example, if you partition user IDs by range but most active users have low IDs (0–100K), Shard A becomes hot while others sit idle.

Uneven data distribution: A million users might not distribute evenly across ranges. Some shards may grow much larger than others, requiring manual rebalancing.

Difficult rebalancing: When you add a new shard, you must split existing ranges, which requires migrating millions of records and coordinating the transition carefully.

No parallelization for single-shard queries: A query for one user can only use one shard; there’s no parallelization potential.

Real-World Example

Google’s Bigtable partitions data by row key range. Tablets (shards) store a range of rows. When a tablet grows too large, it’s split into two tablets at a range boundary. This strategy works well for time-series data (where keys are often monotonically increasing timestamps) but requires careful monitoring to avoid hot tablets.

Hash-Based Sharding

Hash-based sharding applies a hash function to the partition key, then maps the hash value to a shard. This distributes data more uniformly than range-based sharding.

Mechanism

shard_id = hash(partition_key) % number_of_shards

For example:

user_id = 12345678
hash_value = SHA-256("12345678") // Large integer
shard_id = hash_value % 16  // Assuming 16 shards

The hash function produces a uniformly distributed output across the range [0, number_of_shards). This ensures balanced distribution when the partition key is well-distributed.

Advantages

Even data distribution: Hash functions distribute keys uniformly across shards, avoiding hot spots caused by uneven key distributions.

Automatic balancing: Adding new data doesn’t require manual shard assignment decisions. The hash function handles distribution.

Simple to implement: Hash modulo is straightforward; no complex mapping tables required.

Resilient to skewed keys: Even if some partition key values appear more frequently, the hash function distributes them across shards.

Disadvantages

Difficult rebalancing: When adding new shards, the number changes from N to N+1, changing the hash value for almost every key:

Old: shard_id = hash(key) % 16   // Results: [0, 15]
New: shard_id = hash(key) % 17   // Results: [0, 16]

// Almost every key remaps to a different shard!

This “thundering herd” problem requires migrating the majority of data, causing massive I/O load and downtime.

Poor range query performance: Queries like “fetch all users with ID between 100K and 200K” touch all shards, not just one. Parallelization is required but more complex.

Key ordering is lost: Hash functions destroy the natural ordering of keys, making cursor-based pagination harder.

Real-World Example

Cassandra uses hash-based sharding (consistent hashing, discussed next) to distribute data across nodes. YouTube partitions video metadata using hash-based sharding, accepting the trade-off that range queries are expensive but achieving excellent data balancing.

Consistent Hashing

Consistent hashing solves the rebalancing problem of hash-based sharding. Instead of mapping directly to shards, consistent hashing maps both data and shards onto a circular “hash ring,” reducing the data that needs to be remapped when shards are added or removed.

Mechanism

Imagine a circular ring where positions range from 0 to 2^32 - 1:

  1. Hash the partition key to a position on the ring.
  2. Hash each shard identifier to a position on the ring.
  3. A key belongs to the first shard encountered moving clockwise on the ring.
      Shard C (hash = 800)
           |
           v
    ╔══════════════╗
    ║   Hash Ring  ║
    ║  [0, 2^32)   ║
    ╚══════════════╝
         ^  ^
         |  |
    Key A  Shard A (hash = 200)
   (hash=150)
      |
      v
   Shard B (hash = 1200)

When a new shard is added to the ring, only keys between the new shard and the previous shard (going counterclockwise) need to be rebalanced.

Advantages

Minimal rebalancing on node changes: Adding or removing a shard requires rebalancing only a fraction of keys (approximately 1/N of data for N total shards), not the entire dataset.

Distributed node addition: New nodes can be added incrementally without a complete system reconfiguration.

Better than raw hash-based sharding: The overhead of adding a node drops from ~90% data movement to ~5–10%.

Natural load balancing: If hash function distributes evenly, keys and replicas spread across the ring.

Disadvantages

Still requires rebalancing: Although improved over plain hashing, adding nodes still triggers data migrations, which is complex and slow.

Uneven distribution without virtual nodes: Physical shards on the ring might not distribute uniformly. Consistent hashing mitigates this with virtual nodes (multiple hash positions per physical shard).

Complex implementation: Requires careful handling of ring arithmetic, replica selection, and failure scenarios.

Moderate performance improvement: The improvement is operational, not query performance. Range queries remain expensive.

Virtual Nodes

To improve distribution, consistent hashing typically uses virtual nodes: each physical shard is assigned multiple positions on the ring.

// Physical shard "A" has virtual nodes: A-1, A-2, A-3
hash("A-1") = 100
hash("A-2") = 5000
hash("A-3") = 18000

// Keys 0–100 → A-1 (shard A)
// Keys 5000–18000 → A-2 or A-3 (shard A)
// This distributes replicas more evenly

Real-World Example

Dynamo (Amazon’s key-value store) pioneered consistent hashing with virtual nodes. Cassandra and Riak adopted the approach. When a node fails or a new node joins, consistent hashing ensures predictable, incremental rebalancing instead of a complete rebuild.

Directory-Based Sharding

Directory-based sharding maintains an explicit lookup table (directory) mapping partition key ranges or values to shards. The application queries this directory to determine which shard to use.

Mechanism

A shard directory is maintained (often in a dedicated service or metadata store):

Partition Key Range → Shard Location

1–1M → Shard A (db1.example.com:5432)
1M–2M → Shard B (db2.example.com:5432)
2M–3M → Shard C (db3.example.com:5432)

// Or using a hash-based directory:
hash(user_id) % 1000 = 42 → Shard 7 (db7.example.com:5432)

On lookup:

user_id = 12345678
directory_key = user_id / 1000000  // 12
shard_location = directory[12]     // Shard C
query(shard_location, SELECT * FROM users WHERE id = 12345678)

Advantages

Flexible rebalancing: The directory can be updated without changing client code. You can reassign keys to different shards without modifying hashing logic.

Supports any sharding scheme: You can use ranges, hashes, or custom logic. The directory is agnostic.

Human-readable mappings: You can inspect the directory to understand data distribution and manually debug issues.

Can accommodate heterogeneous shards: Different shards can have different capacities; the directory can assign more keys to larger shards.

Disadvantages

Directory service is a bottleneck: Every query must consult the directory. If the directory is slow or unavailable, the entire system stalls.

Operational overhead: Maintaining the directory as the system evolves requires careful coordination and testing.

Consistency challenges: If different clients see different versions of the directory, they route to different shards, causing data inconsistency.

Single point of failure: A misconfigured or corrupted directory can break the entire system.

Real-World Example

MongoDB’s sharded cluster uses a config server (metadata store) that holds shard key ranges and their mappings to replica sets. Clients query the config servers to route operations to the correct replica set. This flexibility allows MongoDB to support different sharding strategies, but the config server becomes a critical dependency.

Geographic Sharding

For globally distributed systems, geographic sharding partitions data by geographic location. All data for a region is stored in a shard (or set of shards) located in that region.

Mechanism

if user.country == "US":
    shard_id = hash(user_id) % shards_in_us
    location = "us-east-1"
elif user.country == "EU":
    shard_id = hash(user_id) % shards_in_eu
    location = "eu-west-1"
elif user.country == "APAC":
    shard_id = hash(user_id) % shards_in_apac
    location = "ap-southeast-1"

Advantages

Data residency and compliance: Data stays in the region required by law (GDPR, CCPA, data sovereignty).

Reduced latency: Users connect to the shard in their region, minimizing network hops.

Disaster isolation: Failure in one region doesn’t immediately impact users in other regions.

Natural load distribution: Global user base is distributed across multiple geographic zones.

Disadvantages

Cross-region queries are expensive: If a user queries data from another region, latency increases significantly.

Uneven regional growth: Regions grow at different rates. Over time, one region’s shards may become much larger than others.

Operational complexity: Managing infrastructure across multiple regions is challenging (replication, monitoring, failover).

Limited scalability within a region: Once a region’s shards are full, you must add more regions or rebalance, both expensive operations.

Real-World Example

Facebook’s TAO (The Associations and Objects) system partitions data by region. Relationships and objects are stored in the region where the user was created. Cross-region queries are routed through a secondary layer, accepting higher latency for correctness.

Comparison of Sharding Strategies

StrategyData DistributionRebalancing CostRange Query PerformanceComplexityBest For
Range-BasedUneven (risk of hot shards)High (requires splitting ranges)Excellent (single shard)LowTime-series, ordered data with even access patterns
Hash-BasedVery EvenVery High (requires full rehash)Poor (all shards)LowSystems with good partition key distribution
Consistent HashingEven (with virtual nodes)Medium (incremental)Poor (all shards)MediumNode addition/removal is frequent (Dynamo, Cassandra)
Directory-BasedFlexible (any scheme)Low (no rehashing)Depends on schemeHighSystems needing custom sharding logic or frequent rebalancing
GeographicUneven (by region)High (cross-region movement)Poor (cross-region queries are slow)Very HighCompliance-driven, latency-sensitive global systems

Practical Considerations

Combining Strategies

Many real systems combine multiple strategies:

  • Two-level sharding: First level geographic (by region), second level hash-based (within region). Users in the US are distributed via hash across US shards.
  • Adaptive sharding: Start with range-based, migrate to directory-based if rebalancing becomes frequent.
  • Hierarchical sharding: Directory maps to primary shards, consistent hashing selects replicas within each primary shard.

Handling Skewed Partitions

Despite careful partition key selection, some shards inevitably become hot. Strategies to mitigate:

Shard splitting: A hot shard is split into two shards with finer-grained ranges. This is expensive but sometimes necessary.

Read replicas: Add read-only replicas of hot shards to distribute read traffic. Writes still go to one shard.

Caching: Keep hot data in a cache layer (Redis, Memcached) to reduce database load.

Micro-sharding: For some keys that are extremely hot (e.g., a celebrity on a social network), store multiple copies across different shards.

Failure Modes and Edge Cases

Shard unavailability: If a shard goes offline, queries mapped to that shard fail. Replication and failover mechanisms are essential (Chapter 11).

Inconsistent routing: If clients use different hashing logic or stale directory data, requests for the same key land on different shards. Distributed transactions become impossible.

Cascading rebalancing: Adding one shard can trigger rebalancing across the entire cluster, creating a “rebalancing storm” that impacts latency.

Silent data corruption: If a shard’s replica set is accidentally recreated without data, old writes are lost. Backups and recovery procedures are critical.

Key Takeaways

  • Range-based sharding is simple and excellent for ordered range queries but risks hot shards if data distribution is uneven.
  • Hash-based sharding distributes data very evenly but makes rebalancing expensive; adding shards requires full data rehashing.
  • Consistent hashing reduces rebalancing overhead when nodes change, making it ideal for systems with frequent node addition/removal.
  • Directory-based sharding is maximally flexible but introduces an operational bottleneck: the directory service must be fast, consistent, and highly available.
  • Geographic sharding is necessary for compliance and latency optimization but adds significant operational complexity and cross-region inefficiency.
  • Most production systems combine multiple strategies to balance performance, operational simplicity, and flexibility.

Connection to Next Section

With a solid understanding of how to partition data across shards, the next critical concern is handling failures and ensuring availability. Chapter 11 focuses on replication strategies, failover mechanisms, and consistency models. When one shard becomes unavailable, replica sets must automatically promote a secondary, requiring careful coordination and monitoring. Understanding replication is essential for building resilient partitioned systems.