Capacity Estimation

When Numbers Gone Wrong Cost You Millions

Imagine you’re an architect at a mid-sized company. Your startup just landed a partnership deal that will triple your daily active users overnight. You size the infrastructure based on what feels safe—you buy enough hardware to handle 50,000 concurrent users. Launch day arrives, and 8,000 concurrent users hit your system. Your servers barely break a sweat. For the next two years, you’re running at 15% capacity while paying for infrastructure that sits idle. Somewhere in the finance department, someone discovers you’re spending $2 million annually on unnecessary hardware.

Now flip the scenario. Another architect skips capacity planning altogether. She estimates “roughly” that her new payment processing service can handle “probably tens of thousands” of transactions per second. Three weeks into production, a major customer runs a scheduled bulk upload of 50,000 transactions. The system tanks. Database connections pool up. Everything crashes. Revenue stops flowing for six hours.

Both of these situations are completely preventable. Capacity estimation—the practice of predicting how much compute, storage, and bandwidth your system needs—is one of the highest-leverage skills in system design. It sits at the intersection of business value and engineering reality. This chapter teaches you how to estimate capacity like a pro, turning vague requirements into concrete numbers that guide architecture decisions.

By the end of this chapter, you’ll understand how to work backward from business goals (the Scale and SLOs we defined in the previous section) to actual infrastructure needs. You’ll learn the “Jeff Dean numbers”—reference latencies that every engineer should have in their back pocket. You’ll see how to do napkin math that turns a fifteen-minute conversation with a product manager into a defensible hardware budget.

Napkin Math: The Engineer’s Secret Weapon

Capacity estimation is the art and science of predicting resource consumption before you build. It answers questions like: How much data will we store in year one? How many requests per second will our API need to handle? How much network bandwidth do we need? The beauty of capacity estimation is that you don’t need exact answers—you need answers that are in the right ballpark.

Here’s the key insight: you’re aiming for accuracy within an order of magnitude (a factor of 10). If your estimate is 100 GB and it turns out to be 80 GB, you’re in great shape. Even if it turns out to be 200 GB, that’s still useful for planning. The estimation process forces you to think about the system holistically and uncover hidden assumptions.

Back-of-the-envelope calculations are the tactical tool for this work. You start with a business metric (daily active users, transactions per day, videos uploaded per hour) and multiply your way down to infrastructure requirements. The math is deliberately rough and fast. You’re not building a spreadsheet model with precision down to the decimal point. You’re sketching, hypothesizing, and validating assumptions in real time.

To do this effectively, you need to internalize some reference numbers—constants that tell you how long different operations take. For instance, reading from L1 cache takes about 4 nanoseconds. Reading from main memory (RAM) takes about 100 nanoseconds. Reading from disk takes about 10 milliseconds. These numbers matter because they help you understand which architectural choices are feasible. If you’re designing a system that needs 100,000 operations per second, you can’t afford to hit disk for each operation—you need to cache in RAM.

You should also think in powers of 10 when dealing with large numbers. A million is 10^6. A billion is 10^9. A trillion is 10^12. When someone says your system will have 2 billion events per day, you should be able to instantly mentally convert that to about 23,000 events per second (2 × 10^9 ÷ 86,400 seconds). This habit of seeing orders of magnitude makes estimation conversational and fast.

Pro tip: When estimating, always state your assumptions explicitly. “I’m assuming each user generates 10 requests per minute” or “I’m assuming average document size is 500 KB.” This makes your estimates auditable and lets teammates spot bad assumptions early.

Planning a Wedding Teaches You About Systems

Let me give you an everyday analogy. Imagine you’re planning a wedding for 500 guests. You need to estimate food requirements.

Start with the business metric: 500 guests. Now estimate per-guest consumption. Each guest typically eats 1.5 servings of the main course, 2 servings of sides, and 1 dessert. That’s 750 servings of main course, 1,000 servings of sides, 500 servings of dessert.

But you also need margin for error. What if 50 more guests RSVP? What if appetites are bigger than normal? Experienced caterers add a 15-20% buffer. So you order 900 servings of main course, not 750. This is over-provisioning, but it’s cheap insurance against running out of food.

Now translate this to systems. Your business metric is “daily active users.” Your per-user consumption is “average requests per user per day” and “average data generated per user.” You multiply these together to get total daily traffic. Then you add a buffer for peak hour traffic (systems don’t spread load evenly across a day—they spike). You add another buffer for failure (if you have two database replicas, one goes down, the other handles all traffic). By the time you’re done, you’ve built in headroom.

The wedding analogy also teaches you about different types of capacity. You need food in the right quantities (storage), delivered at the right time during the reception (throughput/bandwidth), and prepared quickly enough so guests don’t wait (latency). Systems have these same dimensions, and you must estimate all three.

The Numbers Every Engineer Should Know

Here’s a table of latency numbers—called the “Jeff Dean numbers” after the legendary Google engineer who popularized them. These are empirical measurements that haven’t changed much in the past 15 years:

Operation	Latency	Notes
L1 cache reference	4 ns	Fastest, on-CPU memory
L2 cache reference	10 ns	Still on-CPU, slightly slower
L3 cache reference	40 ns	Shared across cores
RAM reference	100 ns	Main memory, 25x slower than L3
SSD random read	150 μs	Solid-state drive, 1,500x slower than RAM
Disk random read	10 ms	Spinning disk, 100x slower than SSD
Network round trip (same datacenter)	0.5 ms	Fast, within a few miles
Network round trip (intercontinental)	150 ms	Slow, physical distance matters
Database query (cached)	1 ms	Typical for an index lookup
Database query (disk)	10 ms	Requires disk I/O

Notice the pattern: each step down the hierarchy is roughly 10x slower. This fundamental fact shapes every architectural decision.

Now let’s walk through the estimation methodology. You start with a business requirement and break it into smaller, estimable pieces:

Step 1: Clarify the scale. How many users? How long is the time window (day, month, year)? What’s the peak vs. average traffic pattern? Are some operations more frequent than others?

Step 2: Estimate daily operations. From your user count and behavior, calculate daily requests, transactions, or events. For example: 10 million daily active users × 5 requests per user per day = 50 million requests per day.

Step 3: Convert to peak QPS. Not all requests arrive evenly throughout the day. Peak hour might be 3x the average. And within an hour, there are traffic spikes. As a rough rule, assume peak QPS is 3-5x the average QPS. So 50 million requests per day ÷ 86,400 seconds ≈ 580 requests per second average. Peak might be 1,740 QPS. Add 20% headroom for unexpected spikes: 2,088 QPS.

Step 4: Estimate storage. How much data does each operation create? An average tweet might be 300 bytes (text, metadata, timestamps). If you have 100 million tweets per day, that’s 30 GB per day. Over one year, that’s roughly 11 TB. Add 50% for replication and backups, and you’re at 16.5 TB minimum for the data itself.

Step 5: Estimate bandwidth. How much data flows in and out? If your API serves 1 MB responses at peak (2,088 QPS), you need 2,088 MB/s of outbound bandwidth, or about 2 Gbps. Internet service providers usually oversell by 3-4x, so you’d contract for 5-8 Gbps to be safe.

Step 6: Estimate cache size. Not everything needs to live in the cache. The Pareto principle (80/20 rule) often applies: 80% of traffic hits 20% of data. If your total data is 16.5 TB and working set is 20%, that’s 3.3 TB of data that needs to be hot in cache. A good cache hit rate (95%+) requires about 3-5x the working set size, so budget 10 TB of RAM for caching.

Here’s a diagram showing how these pieces fit together:

graph LR
    A["Daily Active Users"] -->|per-user behavior| B["Daily Operations"]
    B -->|peak factor| C["Peak QPS"]
    A -->|data per user| D["Storage Needed"]
    C -->|response size| E["Bandwidth"]
    D -->|working set %| F["Cache Size"]
    C --> G["Compute Resources"]
    E --> H["Network Capacity"]
    F --> I["RAM Budget"]
    G --> J["Architecture Decision"]
    H --> J
    I --> J

Let’s estimate capacity for a Twitter-like system. We’ll work through a concrete example.

Assumptions:

100 million daily active users
Average user tweets 2 times per day
Average user reads the feed 20 times per day
Average tweet is 300 bytes (text, links, metadata)
Average feed read retrieves 50 tweets
System must support 99.9% uptime (SLO from previous chapter)

Daily operations:

Tweets written: 100M users × 2 tweets/day = 200M tweets/day
Feed reads: 100M users × 20 reads/day = 2B feed reads/day
Retweets, likes, replies add another 30% traffic: 2.6B total read operations

Peak QPS:

Total reads: 2.6B ÷ 86,400 seconds = 30,093 RPS average
Peak hour is 3x average: 90,279 RPS at peak
Add 25% buffer: 112,849 RPS peak

Write operations:

Total writes: 200M tweets ÷ 86,400 = 2,315 TPS average
Peak: 6,945 TPS
Add buffer: 8,700 TPS peak

Storage:

Daily tweet generation: 200M tweets × 300 bytes = 60 GB/day
One year: 21.9 TB (raw)
Add replication (2x): 43.8 TB
Add backups and growth buffer: 60 TB year one

Bandwidth:

Peak read traffic: Each feed read retrieves 50 tweets × 300 bytes = 15 KB per read
Peak bandwidth: 112,849 RPS × 15 KB = 1.7 GB/s = 13.6 Tbps outbound
This is why Twitter caches aggressively

Cache size:

Working set estimate: 20% of all tweets are “hot” (recently posted, popular accounts)
Hot tweets: 43.8 TB × 0.2 = 8.76 TB
Cache hit ratio target: 95% (so cache must hold most of the working set)
Required cache: 12-15 TB of Redis/Memcached

Compute resources:

112,849 RPS divided across fleet, assume each server handles 5,000 RPS
Servers needed: 112,849 ÷ 5,000 = 23 servers minimum
Add redundancy (2x for failover): 46 servers for read traffic
Add 20% headroom for maintenance: 56 compute servers

Now you have concrete numbers. You can have a real conversation with operations: “We need 56 commodity servers, 60 TB of storage, and 15 TB of cache.” This beats “we need enough to handle our scale.”

When Numbers Tell Stories

The estimation process also reveals architectural insights. Notice how our Twitter example’s bandwidth exploded (13.6 Tbps) while write throughput was manageable (8,700 TPS). This tells us: reads dominate, and caching is non-negotiable. An architect who ignored this insight would over-invest in write throughput and under-invest in cache, leading to poor performance.

Similarly, if we estimate a different system—say, a payment processor handling 10,000 transactions per second with 1 KB transactions, our bottleneck is different. Storage is tiny (10K TPS × 1 KB = 10 MB/s = 864 GB/day). Bandwidth is fine. But latency becomes critical: a payment transaction must complete within 500 ms for user experience, yet might involve multiple database round trips. The architectural lesson: database latency and consistency matter more than throughput.

These insights emerge from estimation work. They guide your high-level design.

When Estimates Matter Most (And When They Don’t)

Here’s the honest truth: your estimate accuracy should match the cost and risk of being wrong. If you’re building an internal tool for your team of 20 people, a rough estimate is fine. If you’re architecting a service that’ll cost $5 million annually in infrastructure and downtime, estimate carefully.

Over-provisioning is safe but expensive. Under-provisioning is dangerous but cheaper initially. The break-even point depends on your tolerance for risk. Startups often under-provision and upgrade as they grow (start lean, scale fast). Established companies often over-provision to avoid customer impact.

Pro tip: estimate in ranges, not points. Don’t say “we need 50 servers.” Say “we need 40-60 servers, with our best guess at 50.” This acknowledges uncertainty and gives stakeholders a bandwidth for planning.

Key Takeaways

Capacity estimation predicts resource consumption by combining business metrics with per-unit resource costs
Order-of-magnitude accuracy is sufficient—you’re aiming within a factor of 10, not exact
Know the latency hierarchy: L1 cache (4ns) → RAM (100ns) → SSD (150μs) → disk (10ms). Each step is 10-100x slower
Convert business metrics step by step: users → daily operations → peak QPS → storage → bandwidth → cache requirements
State assumptions explicitly so your estimates are auditable and reviewable
Use estimates to drive architecture decisions, not just to satisfy curiosity

Estimation Challenges

Try these scenarios to test your skills:

Scenario 1: Video Streaming Platform

50 million daily active users
Average user watches 2 hours of video per day
Average video bitrate: 5 Mbps
Estimate: storage needed for one year of unique video content, peak bandwidth required, and cache size needed for popular videos (assume 10% of videos are “hot”)

Scenario 2: E-commerce Search Index

1 billion products in catalog
Average product record: 10 KB
Peak search load: 100,000 searches per second
Estimate: total storage for the index, peak bandwidth for search responses (assume 1 KB per response), and number of search servers needed (assume each handles 5,000 QPS)

Scenario 3: Real-Time Analytics

10 billion events per day from IoT devices
Average event: 500 bytes
Must keep data online and queryable for 90 days
Estimate: peak ingestion rate (events per second), total storage, and cache size for recent 7-day data (assume 95% of queries hit last 7 days)

What’s Next: From Numbers to Trade-offs

Capacity estimation gives you one critical piece of the puzzle: the scale you’re building for. But now comes the harder question: how do you actually build for that scale? What choices get you there efficiently? The next chapter explores Constraints and Trade-offs—the reality that you can’t optimize for everything simultaneously. You can’t have infinite speed, infinite storage, and infinite consistency. Armed with capacity numbers, you’ll learn how to choose which dimensions matter most for your specific system.