System Design Fundamentals

Lazy Loading & Prefetching

A

Lazy Loading & Prefetching

The Paradox of Eager Loading

Your e-commerce homepage loads 200 product images, a recommendations widget (which queries a machine learning model), user reviews from a separate service, a real-time chat widget, and analytics trackers. The initial page load takes 8 seconds before anything interactive appears.

But here’s what users actually see: the fold—roughly the first 10 products on the screen. The remaining 190 products are below the fold. They don’t need to load for the user to see and interact with the page. Why load everything upfront?

This is lazy loading: deferring the loading of non-critical resources until they’re actually needed. Conversely, intelligent prefetching loads resources the user is likely to need next—so when they click “Buy Now,” the checkout page is already waiting in the background.

Together, lazy loading and prefetching are two sides of the same coin: controlling when resources load rather than loading everything eagerly or hoping the network cooperates. This section builds on network and compression optimization to tackle the “what to load when” problem.


Lazy Loading: Deferring the Non-Essential

Lazy loading defers loading resources until:

  1. They enter the viewport (images, infinite scroll content)
  2. The user needs them (code modules for a feature not immediately used)
  3. There’s spare capacity (prefetching during idle time or low network usage)

The effect is dramatic on initial page load. Instead of loading 8MB of resources upfront, you load 1MB and stream the rest as needed.

Image Lazy Loading: The Native Browser Approach

The simplest form of lazy loading is built into modern browsers:

<img src="placeholder.jpg" loading="lazy" alt="Product">

The browser defers loading the image until it’s approximately 50 pixels from the viewport. For a typical homepage with 100 images, this means loading the 10-15 visible images immediately and deferring the other 85+ until they’re needed.

Support: 95% of modern browsers. For older browsers, a polyfill using the Intersection Observer API provides fallback.

What you save: For a 300KB image, lazy loading one image saves 300KB of bandwidth. On a page with 100 images, lazy loading saves 25-30MB on first load.

The Intersection Observer API: Programmatic Lazy Loading

Intersection Observer lets you detect when elements enter the viewport, triggering custom logic.

const observer = new IntersectionObserver((entries) => {
  entries.forEach((entry) => {
    if (entry.isIntersecting) {
      const img = entry.target;
      img.src = img.dataset.src;  // Load the real image
      observer.unobserve(img);
    }
  });
});

document.querySelectorAll('[data-src]').forEach(img => {
  observer.observe(img);
});

This is how image lazy loading works under the hood. It’s also used for:

  • Infinite scroll (load more items when user scrolls to the bottom)
  • Component lazy loading (load charts, maps, or widgets when visible)
  • Analytics event triggering (track when content actually appears)

Code Splitting: Load JavaScript On Demand

Modern web applications bundle everything into one massive JavaScript file. Instead, split it into smaller chunks:

// Instead of importing upfront
// import { CheckoutPage } from './checkout';

// Load dynamically when needed
const CheckoutPage = React.lazy(() => import('./checkout'));

// In your router
<Route path="/checkout" element={<CheckoutPage />} />

The router loads the checkout code only when the user navigates to /checkout. For a 500KB bundle where checkout is 100KB, you defer that 100KB until the user needs it.

Framework support: React (React.lazy), Vue (defineAsyncComponent), Angular (lazy routes) all have built-in support.

Skeleton Screens: The Perception of Speed

When you lazy load content, show a placeholder so the page doesn’t feel broken.

<div class="image-container">
  <div class="skeleton-loader"></div>
  <img src="product.jpg" loading="lazy" alt="Product">
</div>

<style>
.skeleton-loader {
  background: linear-gradient(90deg, #f0f0f0 25%, #e0e0e0 50%, #f0f0f0 75%);
  background-size: 200% 100%;
  animation: loading 1.5s infinite;
}
</style>

Skeleton screens reduce perceived latency. Instead of a blank space, users see something filling in, creating the impression that loading is happening.

Backend Lazy Loading: Relationships and Pagination

Frontend lazy loading is only half the story. Databases can use lazy loading patterns:

// JavaScript ORM example (Prisma)
const user = await prisma.user.findUnique({
  where: { id: 1 },
  // Note: We're NOT including 'posts' relation here
});

// Later, when we need posts, load them separately
const posts = await user.posts();

Eager loading (fetching relationships immediately) can cause N+1 query problems:

// N+1: Bad - queries user, then queries for every user's posts
const users = await User.findAll();  // 1 query
users.forEach(user => {
  console.log(user.posts());  // N queries (one per user)
});
// Total: 1 + N queries

// Better: Eager load posts for all users in one query
const users = await User.findAll({ include: ['posts'] });

Lazy loading says: don’t fetch posts unless the code actually accesses them. Decide based on your data patterns: if every request accesses user.posts, eager load. If 90% of requests don’t access posts, lazy load.

API Pagination: Prevent the Megaresponse

An API endpoint that returns 10,000 items is lazy loading’s cousin. Use pagination:

GET /api/products?page=1&limit=50
Response: { data: [product1...product50], total: 10000, page: 1 }

Cursor-based pagination is better for large datasets:

GET /api/products?cursor=abc123&limit=50
Response: { data: [product1...product50], next_cursor: def456 }

Cursor-based pagination survives data mutations better than offset-based pagination.


Prefetching: Loading the Future

Lazy loading defers non-essential resources. Prefetching is the opposite: proactively load resources the user is likely to need next, so they’re instant when requested.

When you know the user might click a link, prefetch its resources:

<!-- On the homepage, the user might click "About Us" -->
<link rel="prefetch" href="/about">

<!-- Or prefetch a resource we know they'll need -->
<link rel="prefetch" href="https://cdn.example.com/checkout.js">

The browser loads these in the background with low priority. By the time the user clicks the link, the resources are cached.

DNS Prefetch and Preconnect

Before fetching from a domain, the browser needs to resolve its IP (DNS), establish a TCP connection, and negotiate TLS. These take 100-500ms combined.

<!-- Resolve DNS in parallel with page load -->
<link rel="dns-prefetch" href="//api.example.com">

<!-- Go further: DNS + TCP + TLS, fully ready -->
<link rel="preconnect" href="//cdn.example.com">

For cross-origin APIs or CDNs, preconnect shaves 100-300ms off the first request.

Hover-Based Prefetching

The instant.page library observes when users hover over links and prefetches them:

import instantpage from 'instant.page';

This is elegant: if the user hovers, they’re probably about to click. Start loading. If they move away, the prefetch was wasted but at low priority. If they click, the page is instant.

Service Workers: Predictable Precaching

Service Workers are JavaScript running in the background, separate from your main app thread. You can use them to cache critical resources:

// During service worker installation
self.addEventListener('install', (event) => {
  event.waitUntil(
    caches.open('v1').then((cache) => {
      return cache.addAll([
        '/',
        '/styles.css',
        '/app.js',
        '/icons/logo.png',
      ]);
    })
  );
});

// On future requests, serve from cache
self.addEventListener('fetch', (event) => {
  event.respondWith(
    caches.match(event.request)
      .then((response) => response || fetch(event.request))
  );
});

This precaches critical assets on first visit. On subsequent visits, they load from the local cache, instant.

Predictive Prefetching: ML-Driven Load Decisions

The most sophisticated approach: use machine learning to predict what the user will do next.

// Hypothetical: ML model predicts user will view product details
// Train model on navigation patterns: product list → product detail → checkout

const nextResource = await predictNextNavigation(currentPage);
prefetch(nextResource);  // Preload predicted page

This is used by e-commerce sites like Amazon. Based on your browsing history and similar users’ patterns, they prefetch the checkout page or payment methods.


The Perception vs. Reality Split

Here’s the key insight: perceived performance and actual performance are different.

A skeleton screen doesn’t reduce actual load time, but it reduces perceived load time because users see something happening instead of a blank page.

Prefetching doesn’t reduce time-to-first-byte if the user doesn’t navigate to the prefetched resource, but if they do, they perceive zero latency.

Your optimization strategy should address both:

  1. Actual performance: Lazy loading, compression, network optimization reduce real load time
  2. Perceived performance: Skeleton screens, preconnect hints, prefetching make the app feel instant

Trade-offs: The Downside of Deferral

Lazy loading and prefetching aren’t free. Understand the costs.

Lazy Loading Costs

Layout shift: If an image is lazy-loaded and hasn’t loaded yet, the space is empty. When it loads, the page layout shifts. This is jarring.

<!-- Bad: image height unknown -->
<img src="product.jpg" loading="lazy">

<!-- Good: reserve space so layout doesn't shift -->
<img src="product.jpg" loading="lazy" width="200" height="200">

Modern best practice: include width and height or use the aspect-ratio CSS property.

SEO impact: Search engines crawl the page and see only above-the-fold content initially. Images below the fold aren’t indexed. If SEO matters for those images, lazy loading hurts discoverability.

JavaScript dependency: Lazy loading via Intersection Observer requires JavaScript. If JavaScript fails to load, images never lazy load. Native loading="lazy" doesn’t have this problem.

Prefetching Costs

Wasted bandwidth: If you prefetch a resource the user never accesses, that’s bandwidth burned. On mobile plans with limited data, this can frustrate users.

Cache pollution: Prefetching fills the browser cache. If the user later prefetches something else, they evict your prefetched resource.

Network contention: Prefetching competes with user-initiated requests. A prefetch at low priority will pause if the user clicks to navigate, but it still consumes bandwidth.

The Prediction Problem

Predictive prefetching is effective but brittle. Your ML model might predict wrong—users are unpredictable. E-commerce sites have seen prefetch strategies backfire when the model learns bad patterns.


Server-Side Rendering and Streaming

An alternative to lazy loading: render HTML on the server and stream it to the browser.

// Express.js with streaming
app.get('/', (req, res) => {
  res.write('<html><head>...</head><body>');
  res.write('<h1>Products</h1>');
  res.write('<ul>');

  // Start streaming HTML while fetching data
  fetchProducts().then((products) => {
    products.forEach((p) => {
      res.write(`<li>${p.name}</li>`);
    });
    res.write('</ul></body></html>');
    res.end();
  });
});

The browser receives HTML progressively. It can start parsing and rendering before all data is available. This is particularly powerful for pages with data from multiple sources—fetch fast data first, slow data later.

Next.js and Remix use streaming SSR extensively:

// Next.js with Suspense boundary
export default async function Page() {
  return (
    <>
      <Header />
      <Suspense fallback={<Loading />}>
        <Products />  {/* Waits for data, HTML streams once ready */}
      </Suspense>
    </>
  );
}

A Decision Framework

TechniqueWhen to UseWhen to Avoid
Image lazy loadingBy-default on all images over 50KBSmall hero images, above-the-fold critical images
Code splittingLarge applications, features not needed immediatelySmall bundles (under 100KB total)
Infinite scrollBrowsing-heavy content (social feeds, search results)Finite data (checkout, settings pages)
Service WorkersProgressive Web Apps, offline capabilitySimple stateless pages
Link prefetchHigh-confidence navigation (e.g., “product list → product detail”)Speculative navigation, mobile data users
Predictive prefetchHigh-engagement apps with clear patternsCold starts, unpredictable users
Streaming SSRServer-rendered apps with slow data sourcesSPAs, APIs without server rendering

Key Takeaways

  1. Lazy loading fundamentally changes how you think about page load: You don’t load everything upfront; you load what’s visible now and stream the rest. This is a paradigm shift from the 2010s “load everything once” approach.

  2. The viewport is sacred: Most optimization should focus on above-the-fold content. Resources below the fold can almost always be lazy loaded.

  3. Perceived performance is half the battle: A skeleton screen loading in 100ms feels faster than actual content loading in 200ms. Design both actual and perceived speed.

  4. Prefetching is prediction: You’re betting the user will navigate somewhere. If wrong, you waste bandwidth. Use high-confidence navigation patterns.

  5. Layout stability matters: Lazy-loaded images must reserve space (width/height or aspect-ratio) to prevent layout shift. Cumulative Layout Shift (CLS) is a key performance metric.

  6. Measure end-to-end: Core Web Vitals (Largest Contentful Paint, First Input Delay, Cumulative Layout Shift) capture both technical and perceived performance. Optimize toward these metrics.


Practice Scenarios

Scenario 1: Your e-commerce product listing page loads 200 products with images. Currently, all 200 images load on page load, taking 6 seconds. The fold shows 8 products. How would you optimize? Consider lazy loading, prefetching for pagination, and perceived performance.

Scenario 2: Your internal admin dashboard shows charts, tables, and user analytics. Not every admin visits every section. Which features would you code-split and why? How would you decide whether to eagerly load or lazy-load data?

Scenario 3: You’re building a checkout flow: product list → product detail → cart → checkout → confirmation. Users follow this path predictably. Design a prefetching strategy that loads resources ahead of navigation. What would you prefetch? Where would you draw the line to avoid wasting bandwidth?


The Optimization Hierarchy

Here’s how these three chapters fit together:

  1. Network Optimization (Ch. 108): Reduce bytes on the wire, round trips, and latency per trip. This is the foundation.
  2. Compression (Ch. 109): Compress what you send. Reduces bandwidth and latency.
  3. Lazy Loading & Prefetching (Ch. 110): Load things when you need them, not before. Reduces what you send at all.

Together, these strategies create systems that are genuinely fast—not just fast on paper, but fast to real users with real networks.


Looking Ahead

Performance optimization directly feeds into operational cost optimization. Efficient systems use less bandwidth, less CPU, less storage. They scale better and cost less to operate. In Chapter 21, we’ll explore how to optimize for operational cost—the business side of performance.