Global Load Balancing

Taking Load Balancing Global

So far in this chapter, we’ve focused on balancing traffic across servers within a single data center. Your load balancer sits in front of a cluster of servers, distributes requests, and life is good. But here’s the reality: your users aren’t just in one city. You have customers in Tokyo, London, São Paulo, and Sydney. A user in Australia shouldn’t have their request travel halfway around the world to a server in Virginia when you have infrastructure nearby.

This is the challenge that Global Server Load Balancing (GSLB) solves. GSLB extends the load balancing concepts we’ve discussed and operates at the geographic scale, intelligently routing users to the closest or most optimal data center worldwide. It’s the difference between local optimization and global optimization—and it’s what separates regional applications from truly global ones.

Think of GSLB as the supervisory layer that decides which entire data center your request should go to first. Once you arrive at that data center, the local load balancers we discussed earlier take over. Together, they create a seamless experience regardless of where you are on the planet.

The Mechanics of Global Distribution

Global Server Load Balancing operates at the DNS layer, which makes it elegant and powerful. Remember from Chapter 3 when we discussed DNS? That system that translates domain names into IP addresses? GSLB hijacks that process in the best way possible.

GeoDNS is the foundation of most GSLB implementations. Instead of returning the same IP address for a domain regardless of who’s asking, GeoDNS returns different IP addresses based on the client’s geographic location. When a user in Tokyo queries your domain, they get the IP of your Tokyo data center. When someone in New York queries the same domain, they get the IP of your US East data center. This happens instantly, at DNS resolution time, before any HTTP request is even made.

Anycast routing is another critical technique. This is where the same IP address is advertised from multiple geographic locations using BGP (Border Gateway Protocol). When you have the same IP advertised from your Tokyo and New York data centers, internet routers naturally route packets to the geographically nearest location. It’s like having a mailbox on every corner of every street in the city—you go to the nearest one.

Active-active multi-region architectures represent the holy grail of GSLB. Every region is live, serving real traffic. If one region fails, traffic automatically shifts to others. This is more complex than active-passive setups where one region is hot and others are standby, but active-active gives you massive capacity and true redundancy.

Failover strategies add another layer. Beyond geography, GSLB needs to know if a data center is even healthy. If your Tokyo data center has a database failure, GSLB needs to detect this within seconds and start routing Tokyo users to your next-closest region instead. This requires continuous health checks across global infrastructure—a non-trivial engineering problem.

GSLB differs from local load balancing in one fundamental way: it operates at the DNS/network layer rather than the application layer. Your local load balancer understands HTTP and can make decisions based on request paths, cookies, or response times. GSLB can only see geographic location, health status, and basic network metrics. It’s less granular but operates at global scale.

The Airline Hub System

Imagine a global airline with hubs in Tokyo, London, New York, and São Paulo. When someone in Japan books a flight, the airline routes them through Tokyo—it’s closest. A passenger from Germany books through London. Someone from Brazil routes through São Paulo.

Now, suppose the Tokyo hub encounters a major typhoon and shuts down temporarily. The airline doesn’t tell Japanese passengers, “Sorry, fly to New York instead”—that would be insane. Instead, their routing system automatically redirects Japanese passengers through alternate hubs. Maybe some go through London (longer, but viable), others through New York. The airline’s distribution system detects the problem and reroutes intelligently.

GSLB works identically. Your data centers are hubs. Geography determines the primary routing. Health checks detect failures. And when a hub goes down, traffic automatically redistributes to nearby alternatives, all without users even knowing there was a problem.

How GSLB Works in Practice

GeoDNS Resolution

When a user in Tokyo performs a DNS query for api.example.com, the request hits one of your authoritative DNS servers (or more likely, a managed service like Route 53). The DNS server looks at the client’s IP address, geolocates it to Tokyo, and returns the A record pointing to your Tokyo data center’s load balancer (say, 203.0.113.1).

Meanwhile, a user in New York making the identical query gets a different response: the A record for your US East load balancer (198.51.100.1). Both users are accessing the same domain, but they receive different IPs. The next time they send a request, it goes to their respective regional data center.

User in Tokyo: api.example.com → 203.0.113.1 (Tokyo LB)
User in New York: api.example.com → 198.51.100.1 (US East LB)
User in London: api.example.com → 192.0.2.1 (EU LB)

This is simple but powerful. The DNS response takes advantage of latency from Chapter 3—by routing users to nearby infrastructure, you dramatically reduce response times.

Anycast Networks

Anycast takes a different approach. Instead of returning different IPs per geography, you advertise the same IP address from multiple points worldwide. Your authoritative DNS servers might be at the same IP (198.51.100.1) but physically located in New York, London, and Tokyo.

When someone in Australia queries that IP, their ISP’s routers see the announcement from both locations but naturally route to the closest one (London, in this case). From the user’s perspective, they’re hitting the same IP, but physically they’re connecting to the nearest server. BGP (the routing protocol underlying the internet) makes this work.

Real-World Implementations

AWS Route 53 offers several traffic policies:

Policy Type	Use Case	Example
Geolocation routing	Route based on origin country/continent	All India traffic → Mumbai region
Latency-based routing	Route to lowest-latency endpoint	Measure latency, pick fastest
Weighted routing	Gradual traffic shifting (90% old, 10% new)	Blue-green deployments
Failover routing	Primary/secondary setup	Active-passive multi-region
Geoproximity routing	Route based on distance + bias	Center traffic near a city, weighted

Cloudflare’s global network takes this further. They operate data centers in over 200 cities. When a request hits Cloudflare, they use anycast and GeoDNS to route you to the nearest facility, then use their private backbone to connect to your actual origin servers. This hybrid approach gives them performance benefits even if your origin is not globally distributed.

graph TB
    subgraph "User Locations"
        Tokyo["🌏 User in Tokyo"]
        London["🌍 User in London"]
        Sydney["🌎 User in Sydney"]
    end

    subgraph "DNS Layer"
        GeoDNS["GeoDNS Service<br/>(Route 53)"]
    end

    subgraph "Regional Data Centers"
        LBAP["Load Balancer<br/>Asia-Pacific"]
        LBEU["Load Balancer<br/>Europe"]
        LBAU["Load Balancer<br/>Australia"]
    end

    subgraph "Backend Services"
        APDB["DB + Cache<br/>Asia-Pacific"]
        EUDB["DB + Cache<br/>Europe"]
        AUDB["DB + Cache<br/>Australia"]
    end

    Tokyo -->|Query: api.example.com| GeoDNS
    London -->|Query: api.example.com| GeoDNS
    Sydney -->|Query: api.example.com| GeoDNS

    GeoDNS -->|Return 203.0.113.1| Tokyo
    GeoDNS -->|Return 192.0.2.1| London
    GeoDNS -->|Return 198.51.100.2| Sydney

    Tokyo -->|HTTP Request| LBAP
    London -->|HTTP Request| LBEU
    Sydney -->|HTTP Request| LBAU

    LBAP --> APDB
    LBEU --> EUDB
    LBAU --> AUDB

Failover Detection at Global Scale

Here’s where it gets interesting. GSLB needs to know when a region is unhealthy. You can’t just return an IP that’s offline.

Your GSLB service performs continuous health checks from multiple locations. Every 10-30 seconds, probes from geographically distributed locations query your regional endpoints. If three consecutive checks fail, the region is marked unhealthy. The DNS service then removes that region from rotation.

Second 0: Health check to Tokyo LB → Success ✓
Second 10: Health check to Tokyo LB → Success ✓
Second 20: Health check to Tokyo LB → FAIL ✗
Second 30: Health check to Tokyo LB → FAIL ✗
Second 40: Health check to Tokyo LB → FAIL ✗
→ Tokyo marked UNHEALTHY
→ Future DNS queries for Tokyo users now return London/Sydney IP

But here’s the catch: DNS has TTL (Time To Live), which we discussed in Chapter 3. If you set a 5-minute TTL, some users who cached the Tokyo IP will continue hitting it for up to 5 minutes after it fails. This is a fundamental tension in GSLB design.

Data Replication Challenges

GSLB routes users globally, but your data doesn’t automatically follow. If you serve data from Tokyo but the user is in London, you’re not just solving latency—you’re potentially violating data residency laws or incurring massive inter-region bandwidth costs.

Active-active architectures replicate data across regions in near real-time, but this creates consistency challenges (see eventual consistency, Chapter X). Active-passive architectures keep a warm standby, but failover typically takes minutes, not seconds. Some teams use a hybrid: they replicate read-only data globally (for latency) but keep writes in a primary region (for consistency).

Setting Up Global Routing: A Practical Example

Let’s say you’re building a global API. Here’s how you’d structure GSLB with AWS Route 53:

{
  "Name": "api.example.com",
  "Type": "A",
  "SetIdentifier": "Tokyo-Primary",
  "GeoLocation": {
    "ContinentCode": "AS"
  },
  "AliasTarget": {
    "HostedZoneId": "Z1ABC123",
    "DNSName": "elb-tokyo.example.com",
    "EvaluateTargetHealth": true
  },
  "TTL": 300
}

This rule says: “For queries from Asia, return the Tokyo ELB. Check its health every 300 seconds.” If Tokyo is unhealthy, Route 53 falls back to the next matching rule (perhaps a global default).

How Netflix Does It

Netflix’s approach is instructive. They use a combination of GeoDNS (returning different ISP locations), Open Connect cache nodes (pre-positioned content), and Open Connect provisioning (pushing content to ISPs near users). When you hit Netflix, DNS points you to a cache node in your ISP or region. The actual origination is handled by multiple availability zones.

Netflix’s failure mode is interesting: if a region goes down, the traffic simply loads from other regions with higher latency rather than failing. They’ve accepted higher latency as better than no service.

Multi-Region Active-Active Design

Here’s a simplified architecture:

┌─────────────────────────────────────────────────┐
│         Route 53 (GeoDNS)                       │
│     TTL: 300s, Health checks every 10s          │
└──────────┬──────────────────────────────────────┘
           │
     ┌─────┴─────┬──────────────┬──────────────┐
     │           │              │              │
 ┌───▼──┐    ┌──▼──┐        ┌──▼──┐        ┌──▼──┐
 │Tokyo │    │EU   │        │US   │        │APAC │
 │LB    │    │LB   │        │LB   │        │LB   │
 └───┬──┘    └──┬──┘        └──┬──┘        └──┬──┘
     │         │             │             │
  ┌──▼──┐   ┌──▼──┐       ┌──▼──┐       ┌──▼──┐
  │DB+  │   │DB+  │       │DB+  │       │DB+  │
  │Cache│   │Cache│       │Cache│       │Cache│
  └─────┘   └─────┘       └─────┘       └─────┘

Each region is independent. Writes can happen in any region (with eventual consistency), or you pick a primary region for writes. All regions have read replicas. Health checks across regions determine if a region should remain in DNS rotation.

The Trade-offs of Going Global

DNS Propagation and TTL

DNS TTL is your constant tension. Lower TTL (10 seconds) means faster failover—if a region dies, clients learn about it quickly. But lower TTL means more DNS queries, higher load on authoritative servers, and more cache misses. Higher TTL (1 hour) saves query volume but means slower failover. Most teams settle on 300 seconds (5 minutes) as a middle ground.

Data Consistency Across Regions

If you’re active-active, writes happen in multiple regions simultaneously. How do you reconcile conflicting writes? Last-write-wins is simple but loses data. Conflict-free replicated data types (CRDTs) work for some data but not transactional data. You might designate a primary region for writes (sacrificing some performance). Or you accept eventual consistency and embrace it in your application design.

Cost Explosion

Multi-region deployments are expensive. You’re running infrastructure in 4+ geographic regions, replicating data across them, and paying for inter-region bandwidth. A single-region setup with a CDN for static assets might cost 30% of a full multi-region active-active. When is multi-region worth it? When you have global users with SLAs, when regulatory requirements demand data residency, or when a single failure would be catastrophic.

Operational Complexity

Running one data center is hard. Running four is exponentially harder. You need monitoring, alerting, and runbooks for every potential failure mode. You need teams in different time zones or on-call rotations. Database replication fails in novel ways. Bugs manifest differently in different regions. The operational burden is real.

When Single-Region Is Enough

Not every application needs GSLB. If your users are geographically concentrated, if you have acceptable SLAs even with cross-ocean latency, if your failure tolerance is low, a single well-built data center with a good CDN might be the right call. Start simple. Add GSLB when you have concrete user complaints about latency or when your business justifies the cost.

Key Takeaways

GSLB operates at DNS resolution time, routing users to geographically appropriate data centers before any application traffic flows
GeoDNS and anycast are the two primary techniques, with GeoDNS returning different IPs by location and anycast using BGP to route traffic to the nearest instance of the same IP
Health checks must operate at global scale, detecting regional failures within seconds so DNS can reroute traffic before users experience degradation
Active-active regions maximize uptime but require solving data consistency challenges; active-passive is simpler but trades availability for consistency
DNS TTL creates a fundamental tension: lower TTL enables faster failover but increases query volume and operational overhead
Multi-region is expensive and complex; pursue it deliberately when the business case justifies the engineering effort

Practice Scenarios

Scenario 1: Regional Failover Design

You’re building an e-commerce platform serving the US and Europe. Your primary database is in us-east-1, and you have a read replica in eu-west-1. Design a GSLB setup that:

Routes US users to us-east-1
Routes EU users to eu-west-1
Handles the failure of eu-west-1 (where do EU users go?)
Detects failures within 30 seconds

What’s the trade-off between detection speed and DNS query volume?

Scenario 2: TTL and Failover

Your current setup uses a 3600-second (1-hour) TTL. A data center failure happens. How many users are still hitting the dead region after 5 minutes? After 30 minutes? Redesign your TTL strategy for a failover time under 2 minutes, considering the DNS query volume implications.

Scenario 3: Multi-Region Data Architecture

You want active-active across US, EU, and APAC. Your application has:

User profile data (read-heavy, not heavily modified)
Order history (mostly immutable)
Shopping cart (frequently modified, inconsistency is temporary and acceptable)

For each data type, decide: replicate actively, replicate eventually, or keep in primary region only? Justify your choices.

Next: Caching and Content Delivery

Now that we’ve solved routing users globally and directing them to regional data centers, we face a new challenge: how do we actually serve content efficiently across those regions? Simply having a data center in Tokyo doesn’t help if the database is in Virginia.

In the next chapter, we’ll explore caching strategies and content delivery networks (CDNs) that cache content at the edge, reducing reliance on your origin servers and solving latency at scale. You’ll learn how to use GSLB alongside CDNs for maximum performance and how services like Cloudflare and Akamai sit at the intersection of these two worlds.