Latency & Bandwidth
Why Geography Matters for Your Users
Why does a user in Mumbai experience your US-hosted app differently than a user in New York? The answer lies in two fundamental constraints that govern all network communications: latency and bandwidth. Throughout the previous sections, we explored DNS lookups, TCP connections, and encryption protocols—but those discussions glossed over a critical reality: there’s a speed limit to how fast data can travel, and that limit is determined by physics, not just technology.
As system designers, we can’t change the laws of nature, but we can architect around them. This chapter ties together everything you’ve learned about networking and shows you why your architectural decisions in Chapter 2 and 3 matter so much for real-world user experience. A fraction-of-a-second improvement in latency can be the difference between a delightful user experience and one they abandon. Understanding latency and bandwidth isn’t just about optimization—it’s about making informed trade-offs when you design systems that span continents.
Defining the Two Pillars of Network Performance
Latency is the time it takes for data to travel from source to destination. But this simple definition masks a more complex reality: latency actually compounds from several delays that add together.
Propagation delay is the time light (or electrical signals in copper cables) physically takes to traverse the distance between two points. This is the baseline we can’t beat. Light travels at roughly 200,000 kilometers per second through fiber optic cables—that’s a round trip from New York to London in about 32 milliseconds. This delay exists whether you send one byte or one gigabyte.
Transmission delay is the time required to push all the bits of your data onto the network medium. If you’re sending 1 million bytes (1 MB) over a 1 Mbps connection, transmission delay alone is 8 seconds. This is where bandwidth (which we’ll define shortly) directly impacts latency.
Processing delay is the time spent at routers, switches, and servers handling, analyzing, and forwarding your data. Modern equipment is fast, but packet processing is never instantaneous.
Queuing delay occurs when data arrives faster than the network can handle it—packets line up waiting their turn. This is where congestion creates unpredictable latency.
Together, these delays mean a single network request can have hundreds of milliseconds of latency across the internet, not because equipment is slow, but because physics and congestion combine to create substantial delays.
Bandwidth is often confused with latency, but it’s the opposite concern: it’s the maximum amount of data that can flow through a connection per unit time, measured in bits per second. Think of bandwidth as the total capacity, not the speed at which data moves. A fiber link might support 10 Gbps, but each individual packet still experiences latency.
Actual throughput (what you get in practice) is always under the theoretical bandwidth due to overhead from protocols, retransmissions, and network inefficiency. You might have a 100 Mbps internet connection, but real-world downloads rarely hit that speed.
Round Trip Time (RTT) is latency measured both ways: the time to send a packet and receive a response. For a New York to California connection, RTT might be 40-50ms; from New York to London, 100-130ms. RTT is critical because many protocols (like TCP) wait for acknowledgments, so latency directly affects throughput.
Bandwidth-delay product is the amount of data “in flight” on a network path. If you have 10ms latency (0.01 seconds) and 1 Gbps bandwidth, then 0.01 × 10^9 bits = 10 million bits (1.25 MB) can be in transit simultaneously. This matters for buffer sizing and why high-latency satellite connections need massive buffers.
Finally, we measure latency percentiles: P50 (median), P95 (95th percentile), and P99 (99th percentile). A system with P50 latency of 50ms but P99 of 2 seconds is unpredictably bad for 1% of users—and that 1% can be millions of people globally. We’ll discuss why P99 matters later.
The Highway Analogy
Imagine two highways connecting two cities 300 kilometers apart. Highway A has 10 lanes (high bandwidth) but a 20 mph speed limit (high latency—vehicles move slowly). Highway B has 2 lanes (low bandwidth) but a 120 mph speed limit (low latency—vehicles move fast).
During rush hour, Highway A can move many cars simultaneously, but each car crawls along, so it takes hours to get to the destination. Highway B moves fewer cars at once, but each one zooms across in two hours. Which is better depends on the scenario. If you’re moving one important package, Highway B is superior. If you’re evacuating 10,000 people from a city, Highway A’s throughput matters more than any single person’s speed.
Most system design problems involve both concerns, and they’re often in tension: improving one can cost you in the other.
The Anatomy of Latency in Web Requests
Let’s trace a real web request and see where latency accumulates. When you visit a website, the following happens sequentially:
- DNS lookup (50-100ms): Your browser queries a nameserver to resolve the domain to an IP address. This is a network round trip.
- TCP handshake (RTT, typically 10-100ms depending on distance): Three packets exchanged to establish a connection.
- TLS handshake (1-2 RTTs, roughly 20-200ms): Encryption negotiation and certificate exchange.
- HTTP request transmission (a few ms): Your browser sends the actual request.
- Network propagation (varies by distance): The request travels across the internet.
- Server processing (10-500ms or more): The server does work—database queries, rendering, etc.
- Response transmission (depends on size and bandwidth): Data comes back.
- Client processing (browser rendering, JavaScript, etc.): Your browser interprets the response.
Notice that steps 1-3 alone can easily total 100-300ms before your request even reaches the server. This is why Content Delivery Networks (CDNs) are so powerful: they cache content at edge locations closer to users, reducing the need for long-distance requests.
graph LR
User["👤 User<br/>(Mumbai)"]
DNS["🔍 DNS<br/>50ms"]
TCP["🤝 TCP<br/>30ms"]
TLS["🔐 TLS<br/>50ms"]
Network["🌐 Network<br/>propagation<br/>100ms"]
Server["🖥️ Server<br/>(US)<br/>100ms"]
Back["↩️ Back<br/>100ms"]
Total["⏱️ Total: 430ms"]
User -->|Lookup| DNS
DNS -->|Connect| TCP
TCP -->|Encrypt| TLS
TLS -->|Send Request| Network
Network -->|Process| Server
Server -->|Response| Back
Back --> Total
Typical Latency Values
Here’s a table of latencies you’ll encounter in the real world:
| Operation | Latency | Why |
|---|---|---|
| L1 CPU cache access | 4 ns | On-die, fastest |
| L2 CPU cache access | 12 ns | Still on-chip |
| RAM access | 100 ns | Further from processor |
| SSD read | 100 μs (0.1 ms) | Mechanical I/O |
| HDD read | 10 ms | Spinning disk, seek time |
| Network across same city | 1-2 ms | Fiber speed + minimal hops |
| Network across continent (US to EU) | 80-130 ms | Propagation through 6,000 km of ocean |
| Network across globe (US to Australia) | 150-200 ms | Propagation through 16,000 km |
| DNS lookup (uncached) | 50-100 ms | Recursive query to authoritative server |
| TCP handshake (local) | 1 ms | 1 RTT, minimal distance |
| TCP handshake (intercontinental) | 100-200 ms | RTT is the bottleneck |
| TLS handshake | 1-2 RTTs | Depends on latency |
| Database query (local) | 1-10 ms | Query time + I/O |
| API call to another service | 5-50 ms | Network round trip + processing |
Notice the pattern: once you leave the machine (crossing the network), latency explodes by orders of magnitude. CPU cache access is nanoseconds; a network request is milliseconds—six orders of magnitude slower.
Why P99 Matters More Than Average
Imagine you have an e-commerce checkout service with a median (P50) latency of 50ms, but the 99th percentile is 3 seconds. On a typical day, you might have 100,000 checkouts. That means 1,000 users experience a 3-second delay. If even 1% of those abandon their cart, you’ve lost revenue. More importantly, those 1,000 users will tell others about the slowness. Your brand perception is damaged by the worst experience, not the average.
Tail latency amplification is a related problem: if your request depends on the fastest response from two backend services, the latency is determined by the slower one—in a tail-latency situation, the 99th percentile of service A combined with the 99th percentile of service B creates a far worse experience than their medians would suggest.
This is why modern systems use techniques like request queuing, deadline management, and hedging (sending duplicate requests and using the first to arrive) to fight tail latency.
Bandwidth Optimization
Bandwidth is precious and expensive. Here’s how systems reduce bandwidth consumption:
Compression reduces the size of transmitted data (gzip, brotli for text; modern codecs for video) but costs CPU time to compress and decompress. For large files, savings are worth it.
Protocol efficiency matters: HTTP/2 multiplexing and HTTP/3’s QUIC protocol reduce overhead and improve bandwidth utilization.
Batching sends multiple requests as one (e.g., GraphQL batch queries) to amortize protocol overhead.
Caching reduces bandwidth by avoiding requests altogether. A cached response has zero bandwidth cost.
Real Systems in the Wild
Netflix uses a sophisticated CDN strategy. They’ve built their own CDN with caches inside ISPs’ networks, reducing the distance data travels and unloading expensive backbone costs. They also adaptively adjust video quality based on available bandwidth, so users on slower connections get lower resolution rather than buffering.
Google Maps pre-caches map tiles and route data on your phone so queries don’t require network requests. When they must go to the network, they’ve optimized protocols and compression to minimize bandwidth and latency.
Cloudflare positions edge servers in 200+ cities worldwide, allowing them to serve cached content with latencies under 50ms for most users globally, even if the origin server is far away.
Back-of-Envelope Calculation: Video Streaming
Let’s estimate bandwidth for a video streaming service. A 1-hour HD movie (720p) is roughly 2.5 GB. If streamed over that 1 hour:
2.5 GB = 2.5 × 10^9 bytes = 2.5 × 10^9 × 8 bits = 20 billion bits
Time = 3600 seconds
Required bandwidth = 20 billion bits / 3600 seconds ≈ 5.5 Mbps
That’s why your internet provider says you need 5-10 Mbps for HD streaming. Add overhead, multiple streams, and poor network conditions, and 25 Mbps per user becomes a comfortable target for 4K. With millions of users, your CDN must have terabits per second of capacity.
The Trade-offs You’ll Face
Latency vs. Cost: CDNs reduce latency but cost money. A small startup might serve all users from one region, accepting higher latency for some. A mature company invests in global infrastructure.
Bandwidth vs. Complexity: Compressing data saves bandwidth but adds CPU cost and complexity. Video streaming providers must balance compression ratio against quality loss and encoding cost.
Edge Computing vs. Consistency: Running computation at edge locations (closer to users) reduces latency but makes it harder to keep data consistent across locations. Is it worth the latency improvement?
Caching vs. Freshness: Aggressive caching reduces bandwidth and latency but serves stale data. News sites cache heavily; financial dashboards cache less.
Protocol Overhead vs. Latency: Some protocols add overhead (more packets, more processing) in exchange for features like reliability or ordering. Choosing the right protocol is a trade-off.
Know your constraints: if your users are in one geographic region, global CDN investment is wasteful. If latency is under 50ms and your users are happy, further optimization is probably not worth the cost.
Key Takeaways
- Latency is not just about distance—it compounds from propagation, transmission, processing, and queuing delays.
- Bandwidth is capacity; latency is speed. Both matter, and they’re often in tension.
- RTT (round trip time) directly affects throughput because protocols wait for acknowledgments.
- The bandwidth-delay product determines how much data can be “in flight” simultaneously.
- Tail latency (P99) matters more than average latency for user experience.
- CDNs and edge computing reduce latency by moving computation and content closer to users.
- DNS, TCP handshakes, and TLS negotiation can add 150+ ms of latency before your request even reaches your server.
- Understanding when optimization matters prevents you from over-engineering systems.
Practice Scenarios
Scenario 1: Global Expansion: You’ve built a web service that works well for US users (P50 latency 40ms). You’re now seeing 30% of traffic from India. Your P50 latency for Indian users is now 200ms, and they’re complaining. Discuss three approaches: (a) Accept the latency difference, (b) Build a CDN presence in India, (c) Move your primary servers to India. What are the trade-offs? Which would you choose and why?
Scenario 2: Streaming Service Bandwidth: You’re building a video streaming startup. Estimate the total bandwidth you’d need to serve 10 million concurrent users watching HD video at 5.5 Mbps each. Then discuss strategies to reduce that bandwidth (compression, adaptive quality, caching, etc.). What’s the cost and complexity of each?
Scenario 3: Tail Latency Bug: Your monitoring shows P50 checkout latency is 50ms, but P99 is 2.5 seconds. The average user is happy, but 1% of users during peak hours report slowness. Your checkout service calls three backend services in parallel, each with their own P99 of 1 second. Diagnose the issue and propose solutions.
Connection to the Next Steps
Now that you understand how latency and bandwidth shape user experience, we’re ready to zoom in on the systems that serve requests: databases, caches, and message queues. In the chapters ahead, you’ll learn how these components introduce their own latency and bandwidth constraints—and how to architect around them. You’ll also discover that many of the optimization strategies we’ve discussed here (caching, batching, compression) are re-applied at every layer of your system. Latency and bandwidth are the lenses through which all system design decisions are made.