System Design Process & Methodology
Why You Need a Process
Imagine you’re tasked with building a simple photo-sharing application. Your initial reaction might be to jump straight into coding—spin up a server, connect a database, and start writing features. But what happens when you go from 100 users to 100,000 users? Suddenly, your application crashes during peak hours, your database can’t handle the load, and your team is frantically patching issues instead of building features. This is where a systematic, methodical approach to system design becomes invaluable.
System design is the bridge between understanding individual concepts (like we covered in the previous sections) and building systems that actually work at scale. It’s not just about knowing that databases exist or how caching works—it’s about knowing when to use them, how much to use them, and how they fit together into a cohesive whole. This chapter introduces you to the structured process and methodology that professional engineers follow when designing systems, whether they’re building a startup’s first product or scaling Netflix to millions of concurrent users.
By the end of this section, you’ll have a repeatable five-step methodology in your toolkit. You’ll understand how to ask the right questions before writing a single line of code, how to estimate whether your solution will actually work, and how to think through the architectural implications of your decisions.
Requirements, Estimation, and Design
The Systematic Approach
System design isn’t a creative free-for-all. It’s a structured process where each step builds logically on the previous one. The reason we need this structure is that without it, we make assumptions that come back to haunt us. We optimize for the wrong metric. We design for the wrong scale. We choose components that don’t work well together.
Think of it this way: a functional requirement is something the system must do. If you’re building a URL shortener, you need the ability to create short links and redirect them. These are functional requirements. Non-functional requirements, by contrast, are constraints on how the system does those things. How fast should it respond? How many users should it support? How reliable should it be? What happens if a database server dies?
Let’s look at some examples. For a URL shortener:
- Functional requirements: Users can create short links, users can visit short links and get redirected, users can view their link history
- Non-functional requirements: Average response time under 100ms, support 100,000 URLs created per day, 99.9% uptime, links should be unique and hard to guess
The non-functional requirements are what actually determine your architecture. A simple single-server solution might handle your functional requirements just fine, but if you need 99.9% uptime, you suddenly need redundancy, failover systems, and monitoring.
Back-of-Envelope Estimation
One of the most powerful (and underrated) skills in system design is the ability to quickly estimate whether an idea is feasible. This isn’t about being exact—it’s about being within the right order of magnitude. Can this fit in memory? Will the network be a bottleneck? How long will this query take?
Let’s say we’re designing a URL shortener and expecting 100,000 requests per second at peak load. A typical database can handle 1,000 to 10,000 queries per second. That’s a red flag—we clearly can’t use a single database. Maybe we need caching, sharding, or both. We don’t need to know exact numbers yet; we just need to know our simple approach won’t work.
When you estimate, think in powers of 10. Memory is measured in gigabytes (10^9 bytes), network bandwidth in gigabits per second. A rule of thumb: a typical server can handle 10,000 to 100,000 requests per second depending on the complexity of each request. Disk I/O is much slower than memory access, which is much slower than CPU operations.
High-Level vs Deep-Dive Design
The design process isn’t monolithic. You start with a bird’s-eye view and progressively zoom in. The high-level design (sometimes called the system design) shows the major components and how they talk to each other—databases, caches, load balancers, message queues. It’s intentionally simplified; we’re not worrying about exact database indexes or exact caching strategies yet.
The deep dive happens when you pick a component that you identified as important or risky and really think through its details. If you decided you need a distributed cache, you need to think about consistency, eviction policies, and key design. If you decided you need database sharding, you need to think about shard keys and how to handle queries that span shards.
Bottleneck Analysis and Iterative Refinement
A bottleneck is a component that limits your system’s performance. Often, the bottleneck isn’t where you expect it to be. A well-designed system identifies bottlenecks early and eliminates them systematically. You might think the database is your bottleneck, but after adding a cache, you discover it’s actually the load balancer or the network itself.
This is why system design is iterative. You propose a design, you analyze it for bottlenecks, you improve the most critical one, and you repeat. You don’t just hand off a design document; you’re constantly asking “what breaks first?” and “how do we fix that?”
The Architect’s Blueprint
Consider how a city architect designs a major building. They don’t just decide to build and hope for the best. First, they understand the requirements: How many people will use this building? What activities happen here? Do we need to support wheelchairs, fire escapes, and delivery trucks? These are like functional requirements.
Next, they estimate constraints: What’s the maximum occupancy? How much water and electricity will we need? Can our neighborhood’s infrastructure support it? They sketch a high-level design: “We’ll have an entrance here, offices here, a cafeteria there.” Then they zoom in on critical components. For the foundation, they actually do engineering—what kind of soil are we building on? What weight will it bear? For less critical areas, they might use standard designs.
Finally, they run into reality. “Our traffic estimate was wrong—we need more entrances.” “The water pressure isn’t sufficient for the twentieth floor.” So they iterate and refine. This is exactly how system design works. We start broad, zoom in on what matters, and adapt as we learn more.
The Five-Step Methodology
Professional system designers follow a consistent methodology. While different teams might label the steps differently, the underlying process remains similar. Here’s the framework we’ll use throughout this book:
Step 1: Understand the Requirements
Before you design anything, you must understand what you’re building. This seems obvious, but teams often skip this step or do it superficially. You need both types of requirements.
Functional requirements (what the system does):
- User actions and features
- Data that must be stored and accessed
- Expected workflows and operations
Non-functional requirements (constraints on the system):
- Scale: How many users? How many requests per second? How much data?
- Latency: What’s acceptable response time?
- Throughput: How many operations can happen simultaneously?
- Availability: What uptime is required?
- Consistency: Does data need to be exact everywhere immediately, or is eventual consistency okay?
- Durability: Can we lose data? Is data recovery required?
For our URL shortener example:
- Functional: Users create short links, visit short links, view their links
- Non-functional: 100,000 URLs created per day, 10 million requests per day (mix of creates and redirects), 99.9% uptime, most links are temporary, some might be used thousands of times
Pro tip: Always clarify scale with your interviewer or stakeholder. “What does ‘popular’ mean?” is a crucial question. Is this a startup expecting 1,000 users or a Google-scale system?
Step 2: Propose a High-Level Architecture
Now you sketch the major components and how they connect. You’re not deep in the details yet. For our URL shortener, a high-level design might look like this:
graph TB
Client["Client Browser"]
LB["Load Balancer"]
API1["API Server 1"]
API2["API Server 2"]
Cache["Cache Layer"]
DB["Primary DB"]
DB_Replica["Replica DB"]
Client -->|"shorturl.com/abc123"| LB
LB -->|distribute| API1
LB -->|distribute| API2
API1 -->|check| Cache
API2 -->|check| Cache
Cache -->|miss| DB
API1 -->|write| DB
DB -->|replicate| DB_Replica
Key components and decisions:
- Load balancer (multiple API servers): Distribute traffic, handle single server failure
- Cache (Redis/Memcached): Reduce database load, improve response time for popular links
- Database: Store all links and metadata
- Replication: Keep a backup and reduce read load
This is intentionally simple. We’re saying “these are the major pieces.” We’re not yet saying “exactly how will cache eviction work?” or “what’s our sharding strategy?”
Step 3: Estimate Capacity and Bottlenecks
Now we do the math. Let’s estimate whether our design can handle the requirements:
Throughput analysis:
- 100,000 URLs created per day = 1.16 per second average
- 10 million requests per day = 116 per second average
- Peak load typically 10x average, so peak might be 1,000 requests per second
Storage estimation:
- Each URL entry: ~100 bytes (original URL, short code, metadata)
- 100,000 URLs per day × 365 days × 1 year = 36.5 million URLs
- 36.5 million × 100 bytes = 3.65 GB per year
- This easily fits in modern databases
Database capacity:
- Typical database can handle 1,000-10,000 queries per second
- We’re at ~1,000 peak, so single primary database might work, but reads to replica would help
- Cache hit rate probably ~80% for popular links, so cache reduces database load by 80%, bringing it to ~200 queries per second for reads
Network bandwidth:
- Each redirect: ~50 bytes response (the redirected URL)
- 1,000 requests per second × 50 bytes = 50 KB per second = 0.4 Mbps
- Well within typical server bandwidth (Gbps)
Conclusion: This design handles our requirements comfortably. The bottleneck is probably the database writes, but even those are manageable.
Step 4: Deep Dive into Critical Components
Once we know where the constraints are, we deep dive. In our URL shortener, let’s focus on a few areas:
Short URL Generation: How do we generate unique, short codes? A simple approach is to use a hash function on the original URL, but collisions are possible. A better approach is an auto-incrementing ID converted to base62 encoding (using 0-9, a-z, A-Z). For 36.5 million URLs per year for 10 years (365 million total), we need at least 10^9 possible codes, which base62 easily provides.
# Simple example: convert ID to base62
def id_to_shortcode(id):
alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
shortcode = ""
while id > 0:
shortcode = alphabet[id % 62] + shortcode
id = id // 62
return shortcode if shortcode else "0"
Database Schema:
CREATE TABLE urls (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
original_url VARCHAR(2048) NOT NULL,
short_code VARCHAR(10) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expiry_at TIMESTAMP,
user_id INT,
visit_count INT DEFAULT 0
);
CREATE INDEX idx_short_code ON urls(short_code);
CREATE INDEX idx_user_id ON urls(user_id);
Cache Strategy:
- Cache key: the short code
- Cache value: the original URL
- TTL: 24 hours (links might change or be deleted)
- Eviction policy: LRU (least recently used)
Handling Updates: What if the URL shortener needs to be a redirect service that can be updated? You’d add logic to invalidate cache when a link is updated.
Step 5: Refine and Handle Edge Cases
Finally, we think about everything that could go wrong and what happens:
What if cache fails? We fall back to the database. Requests get slower, but the system still works.
What if the database fails? With replication, we promote a replica to primary. With careful design, this can be automatic.
What if we get traffic spikes? The load balancer distributes across multiple servers. We can scale horizontally by adding more API servers.
What about data consistency? Reads from cache might be slightly stale (up to 24 hours), which is acceptable for a URL shortener.
What about security? We need to sanitize URLs (verify they’re valid HTTP/HTTPS), rate limit to prevent abuse, and log suspicious patterns.
This step isn’t about solving every possible problem—it’s about identifying the most likely problems and having a plan.
Designing a URL Shortener End-to-End
Let’s walk through a complete, condensed example to see how these steps come together in practice.
Step 1: Requirements
- Functional: Create short links, visit links and redirect, track visit count, optional: custom short codes
- Non-functional: 1,000 requests per second peak, 99.95% uptime, respond in <100ms, support 1 million URLs
Step 2: High-Level Design The architecture above (load balancer → API servers → cache → database with replica) works well. We don’t need message queues yet because we’re not doing heavy background processing. We might add asynchronous analytics collection, but that’s not critical.
Step 3: Capacity Check
| Metric | Value | Notes |
|---|---|---|
| Requests per second | 1,000 | At peak load |
| Database queries per second | 200 | After caching |
| Storage needed | 100 GB | For 1M URLs at 100 bytes each |
| Bandwidth used | 50 Mbps | Very manageable |
| Cache hit rate | ~85% | For popular links |
Everything is feasible. No major red flags.
Step 4: Deep Dive
We already designed the short URL generation and database schema above. Additionally:
- Sharding consideration: With 1 million URLs and all queries keyed by short_code (which distributes evenly), we don’t need sharding yet. But if we expected 100 million URLs, we’d shard by first character of short code (62 shards).
- Read replica strategy: Route all reads to replica, writes to primary. Cache layer reduces replica load significantly.
- Monitoring: Track response times, cache hit ratio, database query times, error rates.
Step 5: Edge Cases
- Expired links: Include cleanup job that deletes links past expiry_at. Don’t block requests checking this.
- Rate limiting: Allow each IP 1,000 creates per hour, 100,000 redirects per hour.
- URL validation: Reject URLs that aren’t valid HTTP/HTTPS.
Balancing Ambition and Pragmatism
Over-Design vs. Under-Design
The biggest mistake we see is teams either designing far ahead of their needs or being forced to do emergency redesigns when they hit limits. The reality is this: you should design for your known requirements plus a small safety margin (maybe 2-3x). If you’re expecting 1,000 requests per second, design for 2,000-3,000. But designing for 100,000 requests per second when you expect 1,000 is wasteful and adds complexity that you’ll struggle to maintain.
Here’s the principle: design for your requirements, architect for scalability. This doesn’t mean over-building everything; it means making choices that let you scale easily when you need to. Use a database that can be sharded. Make your API stateless so you can add servers. Store configuration externally so you can change behavior without redeploying.
When to Go Deeper
You should deep dive into components that are:
- Critical - failures cause system unavailability
- Constrained - likely to be bottlenecks
- Complex - lots of moving parts that interact
Don’t deep dive into components that are simple, replaceable, or have plenty of headroom. This is how you avoid over-designing.
Common Mistakes
Mistake 1: Forgetting about operations. You design a system but never think about how it’ll be monitored, alerted, and fixed when things break. Build this into your design from the start.
Mistake 2: Ignoring failure modes. Every component can fail. What’s the plan? Redundancy? Failover? Graceful degradation?
Mistake 3: Not validating assumptions. You estimated 1,000 requests per second peak, but did you actually measure this? Or did you guess? Real data is worth gold.
Mistake 4: Choosing technology before understanding the problem. “We’ll use Kafka” is not a system design. “We’ll use Kafka because we need to decouple our services and absorb traffic spikes” is.
Key Takeaways
-
Understand before designing. Functional and non-functional requirements guide everything. Ask clarifying questions about scale, latency, consistency, and availability.
-
Estimation is a superpower. Learn to quickly estimate whether your design will work. This saves you from pursuing impossible architectures.
-
Start simple, add complexity only when needed. Begin with a high-level design that handles your requirements. Deep dive into critical components only.
-
Architecture enables scaling. Choose components and patterns that let you add capacity (more servers, more replicas, more shards) without redesigning.
-
Iterate and refine. Good system design isn’t a one-shot process. You propose, analyze, find bottlenecks, improve, and repeat.
-
Know your trade-offs. Every architectural decision has trade-offs (consistency vs. availability, simplicity vs. features, cost vs. performance). Understand what you’re trading.
Put It Into Practice
Scenario 1: Design a Chat Application Your company is building a messaging app targeting college students. Estimate the key metrics (messages per second, concurrent users, data storage). Design a high-level architecture. Identify the main bottleneck. How would you shard users or messages? What consistency guarantees do you need, and how does that affect your design?
Scenario 2: Design a Metrics and Monitoring System You need to store millions of data points per second (server metrics, application logs, user analytics). Design a system that can ingest this data and query it. How do you handle the enormous write volume? What about data retention (keeping recent data hot, older data warm, very old data cold)? How would this architecture differ from a traditional OLTP database?
What Comes Next
Now that you understand how to approach system design systematically, the next section focuses on what pitfalls to avoid. You’ll learn the common mistakes that trip up even experienced engineers—the shortcuts that seem safe but create technical debt, the buzzwords that mask poor decisions, and the false assumptions that derail projects. Armed with the methodology in this section and awareness of these pitfalls in the next, you’ll be ready to design systems that work at scale and remain maintainable over time.