Understanding Scale

When Your Success Becomes Your Problem

Imagine you’ve just launched a photo-sharing app. After months of work, you convince your friend group to try it. They love it. They share it with their friends. Within 48 hours, you’re trending on social media. Your servers are drowning. Users see timeouts. Photos take minutes to load. Your database crashes. You’re frantically looking at your infrastructure, wondering what went wrong. Nothing went wrong with the code—you designed it well. What went wrong is that you built for 10,000 users but got 10 million.

This is the story of understanding scale, and it’s arguably the most important skill in system design. When we talk about scale, we’re not just talking about “making things faster.” We’re talking about something deeper: how your system fundamentally changes as it grows. A system that works perfectly at 100 concurrent users might completely break at 100,000 concurrent users—not because of a bug, but because the underlying assumptions of your architecture no longer hold.

In this chapter, we’ll explore three interconnected dimensions of scale: users (how many people are using your system), data (how much information you’re storing and processing), and traffic (how many requests your system receives). By the end, you’ll understand how to think about scale, estimate it for real systems, and make architectural decisions that let your system grow without collapsing.

The Three Dimensions of Scale: Users, Data, and Traffic

Let’s start with the fundamental question: what do we actually mean by “scale”?

User scale is about how many people are using your system. But this is trickier than it sounds. If you have 1 million registered users, that doesn’t tell you much about the load on your system. What matters is how many are active right now. We typically think about users in layers:

Daily Active Users (DAU): How many unique people use your system in a 24-hour period.
Monthly Active Users (MAU): How many unique people use it in a month. This is usually 2-4x your DAU.
Concurrent users: How many people are using your system at the exact same moment. This is the number that really determines your infrastructure needs.

If you have 1 million DAU but your peak concurrent users is only 1,000, your infrastructure needs are very different than if peak concurrent is 100,000.

Data scale is about the volume of information your system needs to store, retrieve, and process. This includes database storage, caches, files, logs—everything. As your system grows, data scale grows in at least two ways: the number of entities (you have more users, more posts, more photos) and the size of each entity (videos get longer, posts get longer). Beyond storage, data scale affects query performance. Searching through 1 million records is different from searching through 1 billion records.

Traffic scale is about the number of requests hitting your system per second—what we call Queries Per Second (QPS) or sometimes Requests Per Second (RPS). But like user scale, it’s more nuanced. Not all requests are equal. A request that reads cached data is cheaper than a request that queries a database. A request that reads is different from a request that writes. We care about the read-to-write ratio—how many reads happen compared to writes. We also care about peak traffic versus average traffic. If your average is 1,000 QPS but peak is 10,000 QPS, you need infrastructure for the peak.

Here’s the key insight: orders of magnitude matter. The difference between 1 million and 10 million isn’t just “10x more of the same thing.” At 10 million, different systems start to matter. Bottlenecks you never noticed now become critical. Problems that seemed impossible to solve before suddenly must be solved.

A Highway System, Not a Parking Lot

Think about how highways work. A two-lane highway can handle a certain amount of traffic at steady state. But on Friday evenings, everyone leaves work at the same time. That two-lane highway becomes gridlocked. Add more lanes, and traffic flows. But at some point, merging becomes the bottleneck, not the lanes themselves. Build another highway parallel to the first, and now traffic can flow, but you need coordinated traffic lights at intersections.

Your system is like this highway. Your servers are lanes. Your database is a highway junction. Your cache is an express lane that bypasses traffic. When you have few users, one lane (one server) is fine. When you have thousands of users, a few lanes handle it. But at a million users, you’re not just adding more lanes—you’re rethinking the whole transportation network. You add bypass routes (caching). You split the highway into regions (sharding). You hire traffic controllers (load balancers and orchestrators).

The beauty of this analogy is that it highlights something crucial: at each order of magnitude, the nature of the problem changes. A traffic light strategy that works for 100 cars per hour is wrong for 10,000 cars per hour. Similarly, a database strategy that works at 10,000 QPS might be completely wrong at 1 million QPS.

Measuring Scale: From Theory to Real Numbers

Let’s get concrete. Here’s how you measure each dimension:

Measuring user scale:

To estimate concurrent users, you need to know your DAU and then estimate what fraction is active at peak time. If you have 1 million DAU and your service is a US-based social media app, peak time is probably evening (say, 8 PM ET). During peak time, maybe 10-15% of your DAU is online. That’s 100,000-150,000 concurrent users.

Did you know? Twitter handles roughly 350,000 tweets per second during peak times, with hundreds of millions of DAU. That’s an enormous number, which is why Twitter’s infrastructure is incredibly complex.

Measuring data scale:

Start with the entities you store. How much data per entity? A user profile might be 1-5 KB. A tweet might be 1-2 KB of text plus metadata. A high-resolution photo might be 5 MB. Now multiply by the number of entities.

If you have 1 billion users, each with a 2 KB profile, that’s 2 TB just for profiles. But you also store posts, comments, likes, follow relationships, and more. Real data scale balloons quickly.

Here’s a table showing typical scale tiers:

graph TD
    A["Scale Tier 1: Thousands of Users<br/>- Single Database<br/>- Simple Caching<br/>- One App Server or Few"] -->|10x growth| B["Scale Tier 2: Millions of Users<br/>- Replicated Database<br/>- Distributed Cache<br/>- Load Balancer + Multiple App Servers"]
    B -->|10x growth| C["Scale Tier 3: Tens of Millions<br/>- Database Sharding<br/>- CDN for Static Content<br/>- Async Processing Queue"]
    C -->|10x growth| D["Scale Tier 4: Hundreds of Millions to Billions<br/>- Complex Sharding Strategy<br/>- Multiple Data Centers<br/>- Advanced Caching, Search Systems<br/>- Real-time Stream Processing"]

Measuring traffic scale:

This is the easiest to define but hardest to predict. A QPS is a query per second. To estimate peak QPS:

Know your peak concurrent users.
Estimate how many requests each user makes per second. A person scrolling a feed might make 1 request per 5 seconds (0.2 QPS). A person actively composing and posting might make 1 request per second.
Multiply: 100,000 concurrent users × 0.2 average requests per user = 20,000 QPS.

But that’s human traffic. You also have background traffic: services talking to other services, cron jobs, analytics pipelines. In practice, non-human traffic can exceed human traffic.

WhatsApp example: WhatsApp handles roughly 60 million messages per minute at peak. That’s 1 million messages per second. Each message is a request (or becomes multiple requests internally). That’s 1M QPS just for messaging, plus read receipts, typing indicators, and more. Their infrastructure is globally distributed with multiple data centers.

Here’s a comparison table:

Dimension	Thousands	Millions	Tens of Millions	Billions
DAU	10K-100K	1M-10M	10M-100M	100M+
Concurrent Peak	100-1K	10K-100K	100K-1M	1M+
Data Stored	Gigabytes	Terabytes	Terabytes-Petabytes	Petabytes+
Peak QPS	Hundreds	Thousands	Tens of Thousands	Millions
Infrastructure	Single Server	Load Balancer + 2-5 Servers	Multiple Servers + Database Replication	Multi-Region, Sharded, Specialized Services

Let’s walk through a realistic example. You’re designing a photo-sharing app like Instagram. Here’s how you’d estimate scale for year one:

User scale:

Target: 1 million DAU by end of year one.
Estimate peak concurrent: 1 million DAU × 15% (peak fraction) = 150,000 concurrent users.

Data scale:

Each user has a profile (1 KB), an average of 50 photos (5 MB each), and stores metadata. That’s 250 MB per user.
1 million users × 250 MB = 250 TB.

Pro tip: Always estimate conservatively and add a 2x buffer for overhead, redundancy, and unexpected data. So plan for 500 TB.

Traffic scale:

Peak concurrent: 150,000 users.
Read-heavy: Assume 80% of traffic is reads (viewing photos/feed) and 20% is writes (uploading photos, liking, commenting).
Average requests per user during peak: 0.5 per second (viewing feed).
Peak QPS = 150,000 × 0.5 = 75,000 QPS.
With 80% reads, that’s 60,000 read QPS and 15,000 write QPS.

Now here’s what this tells us about infrastructure:

Aspect	Decision
Database	Can’t fit on one server. Need replication (master for writes, replicas for reads).
Cache	Definitely need Redis or Memcached for the feed and hot user profiles.
Blob Storage	Photos go to S3 or similar object storage, not the database.
API Servers	One server can handle ~1000 QPS. Need 75-100 app servers.
Load Balancer	Essential. Distributes requests across app servers.
CDN	Use CloudFront or similar to cache photos at edge locations.

This is a complete architecture redesign compared to a system designed for 10,000 users. And you’re still not at “billions”—the architecture would be even more complex then.

The Cost and Risk of Scale

Here’s something they don’t always tell you: preparing for scale is expensive. Setting up distributed databases, caching layers, load balancers, monitoring—these cost money and engineering time.

The trap is either direction. Over-provision and you waste money. You build complex infrastructure for 10 million users, but you only get 1 million. You’ve spent resources on sharding when a single database would have worked fine.

Under-provision and you lose users. Your app works fine until the day it doesn’t. You go viral, traffic spikes, and your system collapses. Users try to access your app, get errors, and they switch to a competitor.

The best approach is designed flexibility. Build for 10x your current scale, not 100x. Make major architectural changes (like sharding) only when you actually need them. But bake in the possibility of those changes. Use abstractions that make it easy to add a cache layer or replicate your database without rewriting code.

Also consider: traffic is rarely smooth. You have peaks (evening hours, weekends) and valleys (middle of the night). You have unexpected spikes (someone posts a viral video). Your infrastructure needs to handle peak traffic, not average traffic. But you can use auto-scaling to add servers during peaks and remove them during valleys, reducing costs.

Key Takeaways

Scale has three dimensions: users (how many people), data (how much information), and traffic (how many requests). Each dimension grows independently, and each affects your architecture differently.
Orders of magnitude matter. The architectural decisions for 1 million users are fundamentally different from 10 million or 1 billion. Small changes in the constraints can force large changes in the system.
Concurrent users, not total users, determine infrastructure needs. 1 million registered users might mean only 100,000 concurrent users at peak. Know the difference.
Know your read-to-write ratio. A system optimized for reads (like a social media feed) has very different architecture than one optimized for writes (like a real-time analytics system).
Over-provisioning is expensive, but under-provisioning is worse. Plan for 10x current traffic. Add complexity only when needed. Design for flexibility.
Estimate conservatively. Add 2x buffers for growth, redundancy, and unexpected issues. It’s better to have capacity than to be surprised.

Practice Scenarios

Scenario 1: Video Streaming Platform You’re building a YouTube-like video platform. You expect 100 million DAU, with peak concurrent of 5 million. Average video is 500 MB. Estimate the data scale you need to support, and describe how your architecture would need to change at this scale.
Scenario 2: Real-Time Chat App You’re building a WhatsApp competitor. Target is 50 million DAU. Users send an average of 50 messages per day, each 1-2 KB. Peak concurrent is 3 million. Estimate peak QPS (assuming each message generates at least one request) and describe the infrastructure needed.
Scenario 3: E-Commerce Platform You have 10 million DAU, with 1 million peak concurrent users. Your database query time is 100ms on average. If each user makes 5 requests to your database per session, estimate how many database servers you need to handle peak traffic.

Looking Ahead: From Scale to Guarantees

You now understand the landscape of scale. But here’s the next crucial question: when your system does handle that scale, how do you ensure it’s reliable? That your users get fast responses not just some of the time, but consistently? That you know when things are going wrong before users complain?

That’s where our next section comes in: SLOs and SLAs. These are the contracts you make with your users about what they can expect from your system. Understanding scale is the foundation. But making promises about scale requires understanding reliability metrics. Let’s dive in.