Edge Servers & PoPs

Now that we understand how CDNs work conceptually, it’s time to zoom in on the actual physical infrastructure that makes global content delivery possible. You might be wondering: where exactly does a CDN store all this content? Who manages it? How does a request from Tokyo get routed to the right server? The answer lies in a carefully orchestrated global network of edge servers and Points of Presence (PoPs).

Edge servers and PoPs form the backbone of every major CDN. They’re the reason why a user in São Paulo can stream video from Netflix as smoothly as someone in New York—despite being thousands of miles from Netflix’s origin servers. Without this distributed infrastructure, every single request would have to travel back to a centralized data center, causing latency spikes and congestion. Instead, PoPs bring content and computing closer to end users.

In this chapter, we’ll explore what edge servers actually are, how PoPs are strategically positioned across the globe, what infrastructure lives inside them, and crucially, how they’ve evolved beyond simple caching into full-fledged edge computing platforms where you can run arbitrary code.

Understanding Edge Servers and PoPs

An edge server is a regular web server positioned at the edge of the network—as close to end users as possible. It caches frequently requested content and serves it directly rather than forwarding every request upstream. When a user in Berlin requests an image, the Berlin edge server handles it instantly from cache. If the content isn’t cached, the edge server fetches it from a parent cache or the origin, stores a copy, and serves it.

A Point of Presence (PoP) is a facility—usually a data center—that houses one or more edge servers along with supporting infrastructure. Think of a PoP as a mini data center. A single PoP might contain dozens of edge servers, load balancers, storage arrays, and networking equipment, all working together to serve a geographic region. Cloudflare operates over 300 PoPs worldwide; Akamai has even more. These aren’t massive hyperscale facilities like Google’s data centers—they’re optimized for speed and cost-efficiency in specific regions.

PoPs are strategically located based on several factors. First, population density: PoPs cluster in major metropolitan areas where user demand is highest. You’ll find many PoPs in cities like London, Singapore, and São Paulo. Second, internet exchange points (IXPs): These are facilities where internet service providers (ISPs) directly interconnect their networks. Placing a PoP at an IXP reduces latency by eliminating intermediate hops. Third, ISP partnerships: CDNs negotiate with regional ISPs to co-locate servers inside their data centers. This gives the CDN access to that ISP’s customer base with minimal latency.

Inside each PoP, you’ll find a layered architecture. Load balancers at the edge direct incoming traffic to appropriate edge servers. Cache servers store frequently accessed content in memory (using Redis or Memcached) or on disk (using specialized cache stores). Compute nodes run lightweight applications. Networking equipment handles routing and DDoS mitigation. Many PoPs also include origin pull servers that fetch content from upstream sources and refresh caches. The exact composition varies depending on the CDN and the PoP’s role in the hierarchy.

The shift from pure edge caching to edge computing represents a fundamental change in how we think about distributed infrastructure. Originally, edge servers simply cached content. Today, they run JavaScript, Rust, and Python code. Services like Cloudflare Workers, AWS Lambda@Edge, and Deno Deploy allow you to deploy functions to hundreds of edge locations instantly. This means you can run authentication checks, personalization logic, A/B testing, and image optimization at the edge—without ever touching your origin server. The edge has become a compute platform, not just a cache.

The implications are profound. Latency drops from 100ms+ (round-trip to origin) to 10-50ms (within the PoP). Compute happens where the data is. You gain the ability to implement sophisticated logic globally without managing your own infrastructure. For content creators and businesses, this changes how you architect applications.

Geographic Distribution: The ATM Network Analogy

Banks don’t build branches in every neighborhood. Instead, they place ATMs in high-traffic locations—shopping centers, transit hubs, airports. Each ATM handles most common operations (withdrawals, deposits, balance inquiries) locally. Only complex transactions require contacting the main branch. The branch can focus on sophisticated operations while ATMs handle the volume.

PoPs work identically. Instead of every user traveling to the origin data center (the main branch), they interact with a nearby PoP (an ATM). The PoP handles the vast majority of requests from its local cache. Occasionally, when content isn’t cached, the PoP fetches it from upstream—but the user doesn’t experience that delay because it happens in the background. This geographic distribution dramatically improves perceived performance and reduces the load on the origin.

Inside a Point of Presence: Architecture and Routing

Let’s walk through what happens when a user requests content from a CDN-protected website.

graph TB
    subgraph User["User's ISP"]
        Browser["Browser / Client"]
    end

    subgraph PoP["PoP (Point of Presence)"]
        LB["Load Balancer"]
        ES1["Edge Server 1"]
        ES2["Edge Server 2"]
        Compute["Compute Node<br/>(Workers/Lambda@Edge)"]
        L1["L1 Cache<br/>(Memory)"]
        L2["L2 Cache<br/>(Disk)"]
        DDoS["DDoS Filter"]
    end

    subgraph Upstream["Parent PoP / Origin"]
        Origin["Origin Server"]
    end

    Browser -->|Request| DDoS
    DDoS -->|Routed| LB
    LB -->|Distributes| ES1
    LB -->|Distributes| ES2
    ES1 -->|Check| L1
    L1 -->|Cache Hit| ES1
    L1 -->|Cache Miss| L2
    L2 -->|On Disk| ES1
    L2 -->|Not Found| Origin
    ES2 -->|Compute Logic| Compute
    Compute -->|Response| Browser
    ES1 -->|Response| Browser

Request routing happens through several mechanisms. When you resolve a DNS name protected by a CDN (like www.example.com), the DNS response returns an anycast IP address—the same IP used by hundreds of PoPs worldwide. Anycast routing means your request is automatically routed to the nearest PoP based on BGP (Border Gateway Protocol) decisions made by your ISP. Your ISP’s routers see multiple identical IP addresses in different locations and choose the topologically closest one. This happens at the network layer, invisible to the user.

Some CDNs use GeoDNS instead. The DNS server detects your location (via IP geolocation) and returns the PoP closest to you. This gives the CDN more control but is slightly slower than pure anycast.

Inside the PoP, requests hit a load balancer that distributes traffic across edge servers. Modern load balancers use consistent hashing—if two requests for the same content arrive simultaneously, they’re routed to the same server, avoiding redundant cache misses.

Most CDNs implement a tiered cache hierarchy. When an edge server misses its local L1 cache (in-memory), it checks the L2 cache (disk-based, usually SSD). If L2 misses, the request goes to a parent PoP or the origin. This hierarchy prevents thundering herd problems where thousands of users request the same uncached item simultaneously. If the parent PoP is also checked before origin, only one request travels all the way up, and the parent caches the result for other edge servers.

Edge computing platforms introduce a new component: compute nodes that run user code. When you deploy a Cloudflare Worker, your JavaScript runs on these nodes. Before serving a response from cache, the compute node can:

Validate authentication tokens
Perform A/B test routing (send 5% to version B, 95% to version A)
Rewrite headers or URLs
Optimize images on the fly
Serve stale content if the origin is down

All of this happens in milliseconds, at the edge, geographically distributed.

Practical Edge Computing Deployments

Let’s see edge computing in action.

Cloudflare Workers for A/B Testing: A retail site wants to test a new checkout flow. They deploy a Worker that runs on Cloudflare’s 300+ PoPs. The Worker reads a cookie from the user and routes them to either the control group (old checkout) or test group (new checkout). This routing happens at the edge, before the request reaches the origin. Because the Worker runs on every PoP, the A/B test is consistent globally without any latency penalty. If the test is bad, you deploy a new version instantly across the entire network.

Shopify’s Edge Rendering: Shopify merchants use edge compute to render storefronts closer to customers. Instead of waiting for a request to reach Shopify’s origin data center in Canada, a Worker in Sydney renders the page locally. The Worker fetches product data from Shopify’s APIs and renders HTML at the edge. Customers in Australia get sub-100ms responses instead of 300+ms.

Image Optimization: When a user requests an image, an edge compute node can resize, crop, convert formats, and apply compression—all on the fly. A Worker detects that the user’s browser supports WebP, resizes the image to their viewport width, and serves the optimized version. The original high-quality image never travels the network; instead, optimized bytes flow to the user.

Trade-offs and Limitations

Edge computing isn’t a silver bullet. Memory and execution time are constrained. Cloudflare Workers limit requests to 50ms execution time and ~128MB memory. AWS Lambda@Edge has similar limits. You can’t run heavy machine learning models or long-running processes at the edge. Complex computations must stay at the origin.

Debugging distributed code becomes harder. Your Worker runs on 300+ PoPs simultaneously. If something breaks, which PoP is misbehaving? CDNs provide logging and analytics, but debugging edge code requires different mindsets than traditional server debugging. You lose the ability to SSH into a single machine and inspect state.

Cost per PoP can add up. Maintaining presence in every geographic region is expensive. CDNs achieve this by negotiating co-location deals with ISPs, but you (as a user of edge compute) pay for these services. A simple cache tier is cheaper than deploying compute code everywhere.

Finally, some operations must remain centralized. Database operations, state management, and persistent storage require consistency that’s difficult to achieve at the edge. Edge compute works best for stateless operations: routing, transformation, caching logic. For stateful operations, you’ll still need an origin or a distributed data store.

Key Takeaways

Edge servers cache content locally; PoPs are facilities housing multiple edge servers, compute nodes, and networking infrastructure distributed globally.
PoPs are strategically placed at internet exchange points, ISP data centers, and high-population-density regions to minimize latency.
Anycast routing automatically directs users to their nearest PoP using BGP; GeoDNS provides an alternative routing mechanism.
Edge computing evolved beyond caching to run arbitrary code (JavaScript, Rust, Python) on hundreds of PoPs, enabling A/B testing, personalization, and optimization without origin latency.
Tiered cache hierarchies (L1 memory, L2 disk, parent PoP, origin) prevent redundant upstream requests when content is uncached.
Edge compute has real constraints: limited memory, strict execution time limits, and stateless operations only; complex logic must remain centralized.

Practice Scenarios

Scenario 1: Global User Analytics You’re building a SaaS analytics dashboard that needs to detect bot traffic at the edge. You want to drop requests from known bot IP ranges without letting them reach your origin. How would you implement this using edge compute? What are the challenges of maintaining an updated IP blocklist across 300+ PoPs?

Scenario 2: Personalized Content Rendering An e-commerce platform wants to show different product recommendations to users based on their geographic location and browsing history. Recommendations for users in Japan should differ from those for users in Germany. Where would you run this logic, and why? What data would you need to cache at the edge versus fetch from origin?

Now that we understand how PoPs and edge servers form the physical and compute backbone of CDNs, let’s explore the strategies that maximize their effectiveness: caching strategies, cache invalidation, and how to design policies that work across this distributed infrastructure.