Cache Hierarchies
Caching isn’t one thing—it’s a layered defense against latency that spans the entire journey from a user’s browser to your database. In the previous section, we explored the fundamentals of caching: what it is, why it matters, and basic strategies. Now, we build on that foundation by looking at where caching happens and how different layers work together. Each layer serves a specific purpose, handles different types of data, and comes with its own trade-offs. Understanding this hierarchy is essential because the wrong caching strategy at the wrong layer can waste resources, while the right strategy can cut response times dramatically.
Think about your typical web request: a user types a URL, the browser checks its local cache, your CDN’s edge servers check theirs, your application servers consult Redis, and finally the database gets queried. Every layer that prevents a downstream request saves milliseconds and reduces load. Building a scalable system means strategically placing caches at every point where they provide value.
The Four Layers of the Cache Hierarchy
We can organize caching into four primary layers, each with different characteristics, capabilities, and use cases:
Client-side caching happens closest to the user. This includes browser caches (where static assets live locally), HTTP-level caching directives, and service workers (which can intercept requests and serve cached content). When your browser asks for the same image twice in five minutes, HTTP cache headers tell it to use the stored version instead of fetching again. Service workers go further, allowing offline functionality and more fine-grained control over what gets cached and when.
CDN caching sits at the edge of the internet, geographically distributed servers that cache your content close to users. When a user in Tokyo requests your image, a CDN edge node in that region serves it instead of your US-based origin server. This reduces both latency (networks move information at fixed speeds; shorter distances win) and backhaul bandwidth costs. CDNs are perfect for static assets—images, CSS, JavaScript, fonts—anything that doesn’t need real-time personalization.
Application-level caching lives in your backend infrastructure, typically using in-memory stores like Redis or Memcached. This layer caches expensive computation results, database query results, and session data. Unlike client caches, which are isolated, application caches are shared across all users, making them incredibly efficient for frequently accessed data. A database query result cached here serves hundreds of users without hitting the database once.
Database-level caching is built into the database itself. Modern databases maintain query caches, buffer pools (keeping frequently accessed pages in memory), and support materialized views (pre-computed query results). These caches operate at the storage layer and reduce both CPU and disk I/O.
Together, these layers create a filtering system: most requests stop at the client, some reach the CDN, a few hit your application cache, and very few make it to the database.
A Library Analogy
Imagine you’re a researcher writing a paper. Your desk holds three books you’re actively using—that’s your L1 cache. Your office bookshelf holds another twenty books you might need—L2 cache. The office library has thousands of books you can walk to in five minutes—that’s your application cache. The city library, hours away, has everything—your database.
You could theoretically get everything from the city library, but you’d spend all your time traveling. Instead, you keep your current books nearby. Your office bookshelf serves most requests before you need to leave. The office library handles the rest. Only rarely do you venture to the city library. Each layer trades capacity for speed: your desk is small but instant, the city library is huge but slow.
In systems, the same principle applies. Client caches have limited storage but extremely fast response times. CDNs have more capacity and serve from a nearby location. Application caches are shared and deep. The database is authoritative but slow. Good system design fills these layers strategically.
How the Layers Stack and Interact
The cache hierarchy works as a series of checks before hitting the database. Here’s the flow:
User Request
↓
1. Browser Cache Check (HTTP headers, service worker)
↓ (miss)
2. CDN Edge Cache Check
↓ (miss)
3. Origin Server + Application Cache (Redis, Memcached)
↓ (miss)
4. Database Query
↓
Cache Results (all layers fill on the way back)
↓
Response to User
When a cache hits, the request stops immediately. When it misses, the request continues downstream, and results are stored in every layer on the return path. This “fill on the way back” approach means popular content gradually fills upper layers without explicit management.
However, this layering creates challenges. If data changes in the database, you need to invalidate caches at every layer, or users see stale information. Cache coherence—keeping all layers consistent—is harder than it sounds.
HTTP Caching Headers
The client and CDN layers understand HTTP cache headers. Your server sends these headers with responses:
Cache-Control: public, max-age=31536000, immutable
This tells the browser (and CDN) that this content is public (can be cached by anyone), valid for one year, and will never change (immutable). For dynamic content:
Cache-Control: private, max-age=3600, must-revalidate
This says the content is private (only store in browsers, not CDNs), cached for one hour, and must check with the origin if you’re not sure it’s fresh.
ETag and Last-Modified headers enable validation: a client doesn’t re-download content if it hasn’t changed. This saves bandwidth even when the cache expired.
CDN Architecture
CDNs like Cloudflare or AWS CloudFront operate thousands of edge locations. When a user requests content, DNS routes them to the nearest edge server. If that server has the content cached, it serves instantly. If not, it fetches from your origin server, caches the result, and serves the user. The next user in that geographic region gets a cache hit.
CDN caching works best for:
- Static assets (images, CSS, JavaScript)
- API responses with long TTLs
- User-generated content that doesn’t change frequently
It’s less useful for:
- Personalized content (difficult to cache with user-specific headers)
- Real-time data (low TTLs defeat the purpose)
- Anything requiring authentication (security concerns)
Application Caching Patterns
Application caches hold computed results. Examples:
// Pseudocode: caching database queries
function getUser(userId) {
cacheKey = "user:" + userId
result = redis.get(cacheKey)
if (result != null) {
return result
}
// Cache miss
user = database.query("SELECT * FROM users WHERE id = ?", userId)
redis.set(cacheKey, user, ttl=3600)
return user
}
When this user is requested frequently, Redis serves it for the full hour without touching the database. This can reduce database load by orders of magnitude.
Database-Level Caching
Databases automatically cache. The buffer pool keeps pages in memory. Query caches store exact query results. Indexes are cached. You can’t turn this off—it’s fundamental to how databases work. What you can do is tune these caches: increase the buffer pool size, warm up frequently-accessed data, use materialized views for expensive aggregations.
The Full Cache Hierarchy Diagram
Here’s how data flows through all layers:
graph TD
User["User Browser<br/>(Client Cache)"]
CDN["CDN Edge Nodes<br/>(Geographic Cache)"]
AppServer["Application Server<br/>(Redis/Memcached)"]
DB["Database<br/>(Buffer Pool/Query Cache)"]
User -->|HTTP Request| CDN
CDN -->|Cache Miss| AppServer
AppServer -->|Application Miss| DB
DB -->|Query Result| AppServer
AppServer -->|Store + Forward| CDN
CDN -->|Store + Forward| User
style User fill:#e1f5ff
style CDN fill:#fff3e0
style AppServer fill:#f3e5f5
style DB fill:#e8f5e9
Each layer is a checkpoint. A hit at any layer prevents all downstream requests.
Comparing the Layers
| Layer | Location | Speed | Capacity | TTL | Best For |
|---|---|---|---|---|---|
| Client | Browser | Instant | Small (MB) | User-controlled | Static assets, rarely-changing resources |
| CDN | Edge servers | ~50ms | Medium (GB) | Hours to days | Static content, API responses |
| Application | RAM (Redis) | ~1ms | Medium (GB) | Minutes to hours | Database queries, computed results |
| Database | Storage + RAM | 10-100ms | Large (TB) | Permanent | Source of truth, authoritative data |
The cache hits at upper layers save orders of magnitude on latency. A database query taking 50ms becomes 1ms with application cache and microseconds from a browser cache.
Practical Examples: Setting It Up
Browser and CDN caching with HTTP headers:
# Static assets: cache for one year
/assets/images/*.jpg
Cache-Control: public, max-age=31536000, immutable
ETag: "abc123"
# API responses: cache for five minutes
/api/products/*
Cache-Control: public, max-age=300
ETag: "v2-xyz789"
# HTML pages: don't cache, always revalidate
/index.html
Cache-Control: public, max-age=0, must-revalidate
ETag: "page-version-123"
CloudFront configuration (pseudocode):
Distribution:
Origins:
- Domain: api.example.com
Behaviors:
- Path: /assets/*
Cache: 1 year
Compress: yes
- Path: /api/*
Cache: 5 minutes
Compress: no
- Path: /
Cache: 0 (no caching)
Application caching with Redis:
function getCachedProduct(productId):
key = "product:" + productId
cached = redis.get(key)
if cached:
return JSON.parse(cached)
product = database.getProduct(productId)
redis.setex(key, 3600, JSON.stringify(product))
return product
Real-world example: Netflix’s caching hierarchy
Netflix uses all four layers brilliantly:
- Browser caching: Static assets (CSS, JS, images) cached with long TTLs.
- CDN caching: Netflix uses its own CDN to cache video streams globally. Encoded videos are cached at edge locations near users.
- Application caching: Memcached stores session data, user preferences, and recommendation results. This prevents database hammering from millions of simultaneous users.
- Database caching: Cassandra (their database) maintains buffer pools and uses materialized views for common queries like “recent watches.”
Together, this means a user opening the app gets instant recommendations from Memcached, streams video from a nearby CDN edge, and rarely hits the authoritative database.
Trade-Offs and Complexity
More cache layers mean more complexity. Each adds potential points of failure. If a cache is wrong, users see stale data. If a cache is inconsistent (different layers have different values), debugging is a nightmare.
The biggest challenge is cache coherence: keeping all layers in sync when data changes. If a user updates their profile, the change must invalidate caches at all four levels, or some requests will return old data. Strategies include:
- TTL-based expiration: Simple but potentially stale. Set short TTLs (minutes) for volatile data.
- Explicit invalidation: When data changes, actively remove it from all caches. Requires coordination.
- Event-driven updates: When data changes, publish an event that updates all caches. Good for distributed systems.
Cost considerations matter too. CDN bandwidth costs real money. Application caching requires infrastructure (Redis servers). Database caching consumes RAM. The more you cache, the more you pay—but the better performance becomes. The sweet spot depends on your bottleneck.
When to cache at each layer:
- Client: Always cache static assets with long TTLs. Caching at this layer is almost free.
- CDN: Cache anything that’s identical for all users and doesn’t change frequently.
- Application: Cache expensive computations and frequently-accessed data. This is where you get the biggest per-request savings.
- Database: Tune the buffer pool and query cache, but don’t rely on them alone for performance.
Most systems benefit most from application-level caching. A well-tuned Redis instance can eliminate 90% of database queries.
Key Takeaways
- Cache hierarchies filter requests: Each layer prevents downstream requests, saving latency and resources.
- Four layers: Client (browser), CDN (edge), Application (Redis), Database (built-in).
- Each layer trades speed for capacity: Closer layers are faster but smaller.
- HTTP headers control client and CDN caching: Cache-Control, ETag, Last-Modified are your tools.
- Cache coherence is the hard part: Keeping layers in sync requires strategy (TTLs, invalidation, or event-driven updates).
- Application caching delivers the biggest wins: Memcached and Redis typically prevent the most database hits.
Practice Scenarios
Scenario 1: E-commerce Product Catalog
You run an e-commerce site with millions of products. Most users browse without buying. Where would you cache product data, and what TTLs would you use? Consider that product prices change hourly and inventory changes constantly, but descriptions are static.
Consider: Would you cache at the CDN? What about the application layer? How would you handle price updates without serving stale data?
Scenario 2: Social Media Feed
A user’s personalized feed requires database queries based on their user ID. It changes constantly (new posts from friends). How would you cache this? Which layers would help?
Consider: Why is CDN caching useless here? Where should you cache? What TTL makes sense?
Connection to Cache Eviction Policies
We’ve placed caches strategically across your system and discussed how they work together. But what happens when a cache fills up? When Redis runs out of memory, which entries should it remove? When a CDN’s storage is full, what gets deleted?
That’s where eviction policies come in. We’ll explore LRU, LFU, and other strategies that decide what stays and what goes. These policies directly impact your cache hit rates and are the final piece of making layered caches work efficiently in practice.