System Design Fundamentals

Capacity Estimation

A

Capacity Estimation

When Numbers Gone Wrong Cost You Millions

Imagine you’re an architect at a mid-sized company. Your startup just landed a partnership deal that will triple your daily active users overnight. You size the infrastructure based on what feels safe—you buy enough hardware to handle 50,000 concurrent users. Launch day arrives, and 8,000 concurrent users hit your system. Your servers barely break a sweat. For the next two years, you’re running at 15% capacity while paying for infrastructure that sits idle. Somewhere in the finance department, someone discovers you’re spending $2 million annually on unnecessary hardware.

Now flip the scenario. Another architect skips capacity planning altogether. She estimates “roughly” that her new payment processing service can handle “probably tens of thousands” of transactions per second. Three weeks into production, a major customer runs a scheduled bulk upload of 50,000 transactions. The system tanks. Database connections pool up. Everything crashes. Revenue stops flowing for six hours.

Both of these situations are completely preventable. Capacity estimation—the practice of predicting how much compute, storage, and bandwidth your system needs—is one of the highest-leverage skills in system design. It sits at the intersection of business value and engineering reality. This chapter teaches you how to estimate capacity like a pro, turning vague requirements into concrete numbers that guide architecture decisions.

By the end of this chapter, you’ll understand how to work backward from business goals (the Scale and SLOs we defined in the previous section) to actual infrastructure needs. You’ll learn the “Jeff Dean numbers”—reference latencies that every engineer should have in their back pocket. You’ll see how to do napkin math that turns a fifteen-minute conversation with a product manager into a defensible hardware budget.

Napkin Math: The Engineer’s Secret Weapon

Capacity estimation is the art and science of predicting resource consumption before you build. It answers questions like: How much data will we store in year one? How many requests per second will our API need to handle? How much network bandwidth do we need? The beauty of capacity estimation is that you don’t need exact answers—you need answers that are in the right ballpark.

Here’s the key insight: you’re aiming for accuracy within an order of magnitude (a factor of 10). If your estimate is 100 GB and it turns out to be 80 GB, you’re in great shape. Even if it turns out to be 200 GB, that’s still useful for planning. The estimation process forces you to think about the system holistically and uncover hidden assumptions.

Back-of-the-envelope calculations are the tactical tool for this work. You start with a business metric (daily active users, transactions per day, videos uploaded per hour) and multiply your way down to infrastructure requirements. The math is deliberately rough and fast. You’re not building a spreadsheet model with precision down to the decimal point. You’re sketching, hypothesizing, and validating assumptions in real time.

To do this effectively, you need to internalize some reference numbers—constants that tell you how long different operations take. For instance, reading from L1 cache takes about 4 nanoseconds. Reading from main memory (RAM) takes about 100 nanoseconds. Reading from disk takes about 10 milliseconds. These numbers matter because they help you understand which architectural choices are feasible. If you’re designing a system that needs 100,000 operations per second, you can’t afford to hit disk for each operation—you need to cache in RAM.

You should also think in powers of 10 when dealing with large numbers. A million is 10^6. A billion is 10^9. A trillion is 10^12. When someone says your system will have 2 billion events per day, you should be able to instantly mentally convert that to about 23,000 events per second (2 × 10^9 ÷ 86,400 seconds). This habit of seeing orders of magnitude makes estimation conversational and fast.

Pro tip: When estimating, always state your assumptions explicitly. “I’m assuming each user generates 10 requests per minute” or “I’m assuming average document size is 500 KB.” This makes your estimates auditable and lets teammates spot bad assumptions early.

Planning a Wedding Teaches You About Systems

Let me give you an everyday analogy. Imagine you’re planning a wedding for 500 guests. You need to estimate food requirements.

Start with the business metric: 500 guests. Now estimate per-guest consumption. Each guest typically eats 1.5 servings of the main course, 2 servings of sides, and 1 dessert. That’s 750 servings of main course, 1,000 servings of sides, 500 servings of dessert.

But you also need margin for error. What if 50 more guests RSVP? What if appetites are bigger than normal? Experienced caterers add a 15-20% buffer. So you order 900 servings of main course, not 750. This is over-provisioning, but it’s cheap insurance against running out of food.

Now translate this to systems. Your business metric is “daily active users.” Your per-user consumption is “average requests per user per day” and “average data generated per user.” You multiply these together to get total daily traffic. Then you add a buffer for peak hour traffic (systems don’t spread load evenly across a day—they spike). You add another buffer for failure (if you have two database replicas, one goes down, the other handles all traffic). By the time you’re done, you’ve built in headroom.

The wedding analogy also teaches you about different types of capacity. You need food in the right quantities (storage), delivered at the right time during the reception (throughput/bandwidth), and prepared quickly enough so guests don’t wait (latency). Systems have these same dimensions, and you must estimate all three.

The Numbers Every Engineer Should Know

Here’s a table of latency numbers—called the “Jeff Dean numbers” after the legendary Google engineer who popularized them. These are empirical measurements that haven’t changed much in the past 15 years:

OperationLatencyNotes
L1 cache reference4 nsFastest, on-CPU memory
L2 cache reference10 nsStill on-CPU, slightly slower
L3 cache reference40 nsShared across cores
RAM reference100 nsMain memory, 25x slower than L3
SSD random read150 μsSolid-state drive, 1,500x slower than RAM
Disk random read10 msSpinning disk, 100x slower than SSD
Network round trip (same datacenter)0.5 msFast, within a few miles
Network round trip (intercontinental)150 msSlow, physical distance matters
Database query (cached)1 msTypical for an index lookup
Database query (disk)10 msRequires disk I/O

Notice the pattern: each step down the hierarchy is roughly 10x slower. This fundamental fact shapes every architectural decision.

Now let’s walk through the estimation methodology. You start with a business requirement and break it into smaller, estimable pieces:

Step 1: Clarify the scale. How many users? How long is the time window (day, month, year)? What’s the peak vs. average traffic pattern? Are some operations more frequent than others?

Step 2: Estimate daily operations. From your user count and behavior, calculate daily requests, transactions, or events. For example: 10 million daily active users × 5 requests per user per day = 50 million requests per day.

Step 3: Convert to peak QPS. Not all requests arrive evenly throughout the day. Peak hour might be 3x the average. And within an hour, there are traffic spikes. As a rough rule, assume peak QPS is 3-5x the average QPS. So 50 million requests per day ÷ 86,400 seconds ≈ 580 requests per second average. Peak might be 1,740 QPS. Add 20% headroom for unexpected spikes: 2,088 QPS.

Step 4: Estimate storage. How much data does each operation create? An average tweet might be 300 bytes (text, metadata, timestamps). If you have 100 million tweets per day, that’s 30 GB per day. Over one year, that’s roughly 11 TB. Add 50% for replication and backups, and you’re at 16.5 TB minimum for the data itself.

Step 5: Estimate bandwidth. How much data flows in and out? If your API serves 1 MB responses at peak (2,088 QPS), you need 2,088 MB/s of outbound bandwidth, or about 2 Gbps. Internet service providers usually oversell by 3-4x, so you’d contract for 5-8 Gbps to be safe.

Step 6: Estimate cache size. Not everything needs to live in the cache. The Pareto principle (80/20 rule) often applies: 80% of traffic hits 20% of data. If your total data is 16.5 TB and working set is 20%, that’s 3.3 TB of data that needs to be hot in cache. A good cache hit rate (95%+) requires about 3-5x the working set size, so budget 10 TB of RAM for caching.

Here’s a diagram showing how these pieces fit together:

graph LR
    A["Daily Active Users"] -->|per-user behavior| B["Daily Operations"]
    B -->|peak factor| C["Peak QPS"]
    A -->|data per user| D["Storage Needed"]
    C -->|response size| E["Bandwidth"]
    D -->|working set %| F["Cache Size"]
    C --> G["Compute Resources"]
    E --> H["Network Capacity"]
    F --> I["RAM Budget"]
    G --> J["Architecture Decision"]
    H --> J
    I --> J

A Real Estimation: Twitter-Like Social Network

Let’s estimate capacity for a Twitter-like system. We’ll work through a concrete example.

Assumptions:

  • 100 million daily active users
  • Average user tweets 2 times per day
  • Average user reads the feed 20 times per day
  • Average tweet is 300 bytes (text, links, metadata)
  • Average feed read retrieves 50 tweets
  • System must support 99.9% uptime (SLO from previous chapter)

Daily operations:

  • Tweets written: 100M users × 2 tweets/day = 200M tweets/day
  • Feed reads: 100M users × 20 reads/day = 2B feed reads/day
  • Retweets, likes, replies add another 30% traffic: 2.6B total read operations

Peak QPS:

  • Total reads: 2.6B ÷ 86,400 seconds = 30,093 RPS average
  • Peak hour is 3x average: 90,279 RPS at peak
  • Add 25% buffer: 112,849 RPS peak

Write operations:

  • Total writes: 200M tweets ÷ 86,400 = 2,315 TPS average
  • Peak: 6,945 TPS
  • Add buffer: 8,700 TPS peak

Storage:

  • Daily tweet generation: 200M tweets × 300 bytes = 60 GB/day
  • One year: 21.9 TB (raw)
  • Add replication (2x): 43.8 TB
  • Add backups and growth buffer: 60 TB year one

Bandwidth:

  • Peak read traffic: Each feed read retrieves 50 tweets × 300 bytes = 15 KB per read
  • Peak bandwidth: 112,849 RPS × 15 KB = 1.7 GB/s = 13.6 Tbps outbound
  • This is why Twitter caches aggressively

Cache size:

  • Working set estimate: 20% of all tweets are “hot” (recently posted, popular accounts)
  • Hot tweets: 43.8 TB × 0.2 = 8.76 TB
  • Cache hit ratio target: 95% (so cache must hold most of the working set)
  • Required cache: 12-15 TB of Redis/Memcached

Compute resources:

  • 112,849 RPS divided across fleet, assume each server handles 5,000 RPS
  • Servers needed: 112,849 ÷ 5,000 = 23 servers minimum
  • Add redundancy (2x for failover): 46 servers for read traffic
  • Add 20% headroom for maintenance: 56 compute servers

Now you have concrete numbers. You can have a real conversation with operations: “We need 56 commodity servers, 60 TB of storage, and 15 TB of cache.” This beats “we need enough to handle our scale.”

When Numbers Tell Stories

The estimation process also reveals architectural insights. Notice how our Twitter example’s bandwidth exploded (13.6 Tbps) while write throughput was manageable (8,700 TPS). This tells us: reads dominate, and caching is non-negotiable. An architect who ignored this insight would over-invest in write throughput and under-invest in cache, leading to poor performance.

Similarly, if we estimate a different system—say, a payment processor handling 10,000 transactions per second with 1 KB transactions, our bottleneck is different. Storage is tiny (10K TPS × 1 KB = 10 MB/s = 864 GB/day). Bandwidth is fine. But latency becomes critical: a payment transaction must complete within 500 ms for user experience, yet might involve multiple database round trips. The architectural lesson: database latency and consistency matter more than throughput.

These insights emerge from estimation work. They guide your high-level design.

When Estimates Matter Most (And When They Don’t)

Here’s the honest truth: your estimate accuracy should match the cost and risk of being wrong. If you’re building an internal tool for your team of 20 people, a rough estimate is fine. If you’re architecting a service that’ll cost $5 million annually in infrastructure and downtime, estimate carefully.

Over-provisioning is safe but expensive. Under-provisioning is dangerous but cheaper initially. The break-even point depends on your tolerance for risk. Startups often under-provision and upgrade as they grow (start lean, scale fast). Established companies often over-provision to avoid customer impact.

Pro tip: estimate in ranges, not points. Don’t say “we need 50 servers.” Say “we need 40-60 servers, with our best guess at 50.” This acknowledges uncertainty and gives stakeholders a bandwidth for planning.

Key Takeaways

  • Capacity estimation predicts resource consumption by combining business metrics with per-unit resource costs
  • Order-of-magnitude accuracy is sufficient—you’re aiming within a factor of 10, not exact
  • Know the latency hierarchy: L1 cache (4ns) → RAM (100ns) → SSD (150μs) → disk (10ms). Each step is 10-100x slower
  • Convert business metrics step by step: users → daily operations → peak QPS → storage → bandwidth → cache requirements
  • State assumptions explicitly so your estimates are auditable and reviewable
  • Use estimates to drive architecture decisions, not just to satisfy curiosity

Estimation Challenges

Try these scenarios to test your skills:

Scenario 1: Video Streaming Platform

  • 50 million daily active users
  • Average user watches 2 hours of video per day
  • Average video bitrate: 5 Mbps
  • Estimate: storage needed for one year of unique video content, peak bandwidth required, and cache size needed for popular videos (assume 10% of videos are “hot”)

Scenario 2: E-commerce Search Index

  • 1 billion products in catalog
  • Average product record: 10 KB
  • Peak search load: 100,000 searches per second
  • Estimate: total storage for the index, peak bandwidth for search responses (assume 1 KB per response), and number of search servers needed (assume each handles 5,000 QPS)

Scenario 3: Real-Time Analytics

  • 10 billion events per day from IoT devices
  • Average event: 500 bytes
  • Must keep data online and queryable for 90 days
  • Estimate: peak ingestion rate (events per second), total storage, and cache size for recent 7-day data (assume 95% of queries hit last 7 days)

What’s Next: From Numbers to Trade-offs

Capacity estimation gives you one critical piece of the puzzle: the scale you’re building for. But now comes the harder question: how do you actually build for that scale? What choices get you there efficiently? The next chapter explores Constraints and Trade-offs—the reality that you can’t optimize for everything simultaneously. You can’t have infinite speed, infinite storage, and infinite consistency. Armed with capacity numbers, you’ll learn how to choose which dimensions matter most for your specific system.