Network Optimization

The Hidden Cost of Networks

Imagine you’re calling an API endpoint that returns 5KB of JSON. Straightforward, right? But here’s what’s really happening on the wire: your 5KB payload travels with HTTP headers adding another 2KB, TLS encryption adds overhead, DNS lookups introduce 50-100ms latency, and TCP slow start means the connection doesn’t reach full speed immediately. By the time the response reaches a user on a slow 3G mobile connection, you’re looking at 50KB total and 2+ seconds of latency—just for the network.

In distributed systems, this compounds dramatically. When a request touches 20 microservices internally, each with 10ms of network latency, you’ve already burned 200ms just waiting for network round trips—before any computation happens. Network optimization is where you reclaim that time. It operates along three dimensions: reducing bytes on the wire, reducing the number of round trips, and reducing latency per trip. Optimize one dimension, and your system gets faster. Optimize all three, and you create something fundamentally more responsive.

This chapter builds on the networking fundamentals from Chapter 3, but now we’re applying them strategically to system design.

The Three Dimensions of Network Optimization

Network optimization isn’t a single technique—it’s a multi-dimensional strategy that addresses how data moves through your system.

1. Reducing Bytes

Every byte transmitted costs bandwidth, storage, and latency. Smaller payloads mean faster transmission, especially on constrained connections.

Compression is your first tool. HTTP compression (gzip, Brotli, Zstandard) can reduce JSON responses by 80-90%. A 500KB response becomes 50KB. For a service handling a million requests daily, that’s 450GB of bandwidth saved—and a proportional reduction in latency.

Efficient data formats matter too. JSON is human-readable but verbose—it includes field names with every record. Protocol Buffers compress the same data into a binary format, reducing size by 50-70% while also accelerating serialization and deserialization. gRPC uses Protocol Buffers by default, which is why it’s popular in performance-critical systems.

API response optimization means sending only what you need. GraphQL’s field selection lets clients request specific fields instead of entire resources. REST APIs can adopt similar patterns through partial responses (return only name and email instead of the full user object). Pagination prevents massive responses—return 50 items instead of 10,000.

Minification reduces code size by removing whitespace, shortening variable names, and eliminating comments. A 200KB JavaScript bundle becomes 50KB after minification and compression.

2. Reducing Round Trips

Every round trip adds latency. HTTP/1.1 created a new TCP connection for each request (or used sequential requests on a single connection), forcing the client to wait for one request to complete before sending the next. Modern protocols change this fundamentally.

HTTP/2 multiplexing allows multiple requests over a single TCP connection simultaneously. Instead of waiting for response 1 before sending request 2, you send both at once. The header compression (HPACK) also reduces HTTP header overhead significantly.

HTTP/3 and QUIC take this further with connection establishment that’s faster than TCP’s three-way handshake (a 0-RTT resumption mechanism for repeated connections), and a more resilient connection model where a single lost packet doesn’t block all streams.

Connection reuse and pooling between services is essential. Instead of creating a new TCP connection for each API call, keep connections alive. HTTP keep-alive headers, gRPC persistent channels, and database connection pools all reduce the overhead of establishing connections.

Batching requests is an older but powerful technique. Instead of 10 separate API calls, make 1 call with a batch of 10 items. This reduces round trips from 10 to 1.

3. Reducing Latency Per Trip

Even with fewer bytes and fewer round trips, the time for each round trip matters. Network latency is the speed of light divided by distance—you can’t change the laws of physics, but you can get closer to your users.

CDNs (Content Delivery Networks) cache static assets at edge locations closer to users. A request that would take 200ms to your origin takes 20ms to the nearest CDN edge. For static content, this is the highest-ROI optimization available.

Edge computing extends this further. Services like Cloudflare Workers or AWS Lambda@Edge let you run code at the edge, making decisions and transforming data without a round trip to your origin server.

DNS optimization reduces lookup latency. DNS TTLs (time-to-live) determine how long a DNS answer is cached. A TTL of 300 seconds means uncached DNS lookups happen every 5 minutes—add 50-100ms latency each time. Higher TTLs reduce lookup frequency, but make IP address changes slower. DNS prefetching and preresolution (using <link rel="dns-prefetch"> or <link rel="preconnect">) let the browser resolve DNS in parallel with other work.

TLS/SSL optimization reduced round trips with TLS 1.3, which requires one fewer round trip for connection establishment. Session resumption caches the cryptographic state, making reconnections faster. OCSP stapling eliminates the need for the client to verify certificate revocation online.

From HTTP/1.1 to HTTP/3: A Progression

Let’s see how these protocols evolved to address round-trip latency:

HTTP/1.1:  Request 1 → Response 1 → Request 2 → Response 2 (sequential)
           Each request may use a new TCP connection or wait in sequence

HTTP/2:    Request 1 ──┐
           Request 2 ──┼─→ Multiplexed over single TCP connection
           Request 3 ──┘

HTTP/3:    HTTP/2 benefits + QUIC transport
           - 0-RTT resumption
           - No head-of-line blocking across streams
           - Connection migration (WiFi to cellular without disconnecting)

HTTP/2 multiplexing is powerful, but head-of-line blocking can occur: if stream 1 loses a packet, streams 2 and 3 wait because TCP must maintain order. HTTP/3’s QUIC transport fixes this by allowing streams to advance independently.

Practical Optimization Strategies

DNS Prefetching and Preconnect

When a user lands on your page, they might click on external links. Start establishing connections in the background:

<!-- Resolve DNS for a domain we'll use later -->
<link rel="dns-prefetch" href="//api.example.com">

<!-- DNS + TCP + TLS handshake, fully ready for requests -->
<link rel="preconnect" href="//cdn.example.com">

<!-- If we're very confident about fetching a resource -->
<link rel="prefetch" href="//cdn.example.com/product-data.json">

Each of these reduces latency for subsequent requests or resource loads.

Connection Pooling Between Services

In a microservices architecture, service-to-service communication goes over HTTP/gRPC. Instead of creating a new connection for every request:

// Good: Reuse connections via a pool
const http = require('http');
const agent = new http.Agent({ keepAlive: true, maxSockets: 50 });

function callService(url) {
  return fetch(url, { agent });
}

gRPC connections are persistent by default, maintaining a pool of HTTP/2 channels.

Conditional Requests with ETags

Reduce bytes by not sending unchanged data:

Client request: GET /api/user/123
Server response:
  200 OK
  ETag: "abc123"
  Content: { name: "Alice", age: 30 }

Client request (5 minutes later): GET /api/user/123
  If-None-Match: "abc123"
Server response:
  304 Not Modified
  (zero bytes of body, client uses cached copy)

This saves bandwidth when data hasn’t changed.

Content Negotiation for Compression

Servers should respect client preferences:

Client request: GET /api/data
  Accept-Encoding: gzip, br, deflate
Server response:
  Content-Encoding: br
  Content: [brotli-compressed data]

Servers choose the most efficient supported compression algorithm.

A Reality Check: Service Mesh Latency

In service mesh architectures (like Istio), every request flows through sidecar proxies for security, observability, and traffic management. Each hop adds latency—typically 5-10ms per sidecar. With 10 services per request, that’s 50-100ms added just for the mesh.

Decision point: When optimizing network latency, measure where time is actually spent. Service mesh overhead might dwarf network optimization gains. Sometimes the best optimization is using direct service discovery (like Kubernetes DNS) when security and observability can be achieved through other means.

Trade-offs: Choosing Your Strategy

Technique	Benefit	Cost/Consideration
gzip compression	Ubiquitous support, good ratio	CPU overhead, slower than LZ4
Protocol Buffers	50-70% size reduction, fast	Not human-readable, schema coupling
HTTP/2	Multiplexing, header compression	Head-of-line blocking (TCP), slightly higher CPU
HTTP/3/QUIC	Best latency, no head-of-line blocking	Still rolling out, less widespread support
CDN	Massive latency reduction for static content	Cost, cache invalidation complexity
DNS prefetch	Low cost, high payoff	Doesn’t prevent all DNS latency in first request
Connection pooling	Eliminates handshake overhead	Memory overhead, connection timeouts must be managed

Key Takeaways

Network optimization compounds: A 10ms saving per service hop across 20 services saves 200ms total—often the difference between an acceptable and an unacceptable user experience.
Optimize all three dimensions: reducing bytes, reducing round trips, and reducing latency per trip create synergistic benefits. Optimizing just one leaves significant gains on the table.
Protocol matters: HTTP/2 and HTTP/3 solve fundamental problems with HTTP/1.1. If you’re not using them, you’re leaving performance on the table. gRPC and Protocol Buffers are superior for service-to-service communication.
Compression is high-ROI: gzip and Brotli provide 80-90% reduction for text data with minimal complexity. This is often the first optimization to implement.
CDNs are your friend for static content: Geo-distributed caching near users dramatically reduces latency. The cost is usually worth it.
Measure first, optimize second: Service mesh overhead, DNS lookup patterns, and TLS handshakes are common hidden latencies. Profile your system before optimizing.

Practice Scenarios

Scenario 1: Your internal service makes 15 HTTP/1.1 calls to fetch user data, product recommendations, and inventory status. Response time is 800ms. How would you optimize this? (Consider multiplexing, batching, and connection reuse.)

Scenario 2: Your API serves a 600KB JSON response to mobile clients. Network latency is your bottleneck. You can implement either gzip compression or Protocol Buffers. Which would you choose and why? What if your clients are JavaScript browsers (can’t easily parse Protocol Buffers)?

Up Next

Network optimization gets us most of the way, but we’re still sending data over the wire. In the next section, we explore compression strategies—diving deeper into algorithm selection, trade-offs, and real-world implementations that can reduce payload sizes even further.