System Design Fundamentals

Pagination, Filtering & Sorting

A

Pagination, Filtering & Sorting

The 45-Second Mistake

Imagine you just deployed /api/products for your e-commerce platform. It returns all 2 million products in a single JSON response. Response time: 45 seconds. Payload size: 800MB. Your mobile app crashes trying to parse it. A user refreshes the page while scrolling, and the entire 2 million records download again—eating through their data plan.

Even with just 10,000 records, returning everything is wasteful. The user only sees 20 items at a time. They might search for “blue running shoes under $100.” They probably want results sorted by price or rating, not some random order from the database.

This is the reality of APIs handling real data. Pagination, filtering, and sorting aren’t optional luxuries—they’re essential. They directly impact three critical dimensions of API design: response time, data transfer, and user experience. Without them, your API doesn’t scale beyond toy datasets.

Why These Three Go Together

At first glance, pagination, filtering, and sorting seem like separate concerns:

  • Pagination: Breaking results into manageable chunks
  • Filtering: Letting clients request only records that match criteria
  • Sorting: Letting clients control the order of results

But they’re deeply intertwined. When a user asks for “page 3 of blue shoes sorted by price,” the database must filter, sort, then paginate. The order matters. The index strategy must account for all three. A cursor-based pagination implementation must know the sort key. Sorting without filtering might return millions of records before paging them.

We’ll treat them as a cohesive system because that’s how they function in production.

The Physical Bookstore Analogy

Imagine browsing a massive physical bookstore with 2 million books and no organization system.

You’d need to walk past every single shelf, examining every spine. Want the 5th shelf of science fiction books? You’d have to count past all the history, biography, and romance sections first (that’s the “skipping problem” of offset pagination—we’ll come back to this).

Now introduce pagination: the bookstore has multiple rooms, each with a manageable number of books. Room 1 has books 1–500. Room 2 has books 501–1,000. You can step into Room 3 and only see 500 books instead of 2 million.

Add filtering: instead of wandering through all genres, you go to the “Science Fiction” section. You’ve already reduced your search space from 2 million to maybe 100,000 books.

Add sorting: within Science Fiction, choose to browse by “Newest Arrivals,” “Bestsellers,” or “Author A–Z.” The order helps you find what you want.

Together, these three features transform a chaotic experience into a navigable one. Your API is the same: without them, clients drown in data.

How Offset Pagination Works (And Breaks)

The simplest approach is offset-based pagination:

GET /api/products?page=3&limit=20

This requests the 3rd page with 20 items per page. Under the hood:

SELECT * FROM products
LIMIT 20
OFFSET (3 - 1) * 20;  -- OFFSET 40

The database skips the first 40 rows and returns rows 41–60. Simple, intuitive, familiar to developers.

The problem emerges at scale. On a table with 10 million rows, OFFSET 9,999,980 must internally scan 9,999,980 rows to skip them. Each page request deeper into the dataset gets slower. Try requesting page 500,000 on a dataset with 10 million rows—that’s an OFFSET 9,999,980 scan.

There’s a second, subtler problem: the skipping trap. Imagine you’re on page 2 of products sorted by price. You see a product priced at $49. You click “next” to page 3. But between your page 1 and page 2 requests, another user added a $48 product, which sorts before your current position. Now product at position 40 shifts to position 41. You might see the same product twice, or skip one entirely.

When to use offset pagination: Small datasets (under 100K records) where the data changes infrequently, or where you control all insertions and deletions. Simple client logic wins.

Cursor-Based Pagination: The Stable Alternative

Instead of asking for a page number, we ask for a position:

GET /api/products?limit=20&cursor=abc123def456

The cursor is an opaque pointer—the client doesn’t care what it is, only that it works. Typically, it’s an encoded ID or timestamp. The server decodes it and runs:

SELECT * FROM products
WHERE id > DECODE(cursor)
ORDER BY id
LIMIT 20;

This returns the first 20 products after the cursor position. The next cursor is the ID of the last returned product, encoded.

Why is this stable? If rows are inserted or deleted, we’re still asking for “products after ID 12345,” not “products in positions 41–60.” The query doesn’t care how many rows exist before the cursor—it just fetches the next batch. This solves the skipping trap.

Here’s a working example with Express.js and PostgreSQL:

app.get('/api/products', async (req, res) => {
  const limit = Math.min(parseInt(req.query.limit) || 20, 100);
  const cursor = req.query.cursor ? Buffer.from(req.query.cursor, 'base64').toString('utf-8') : null;

  let query = 'SELECT id, name, price FROM products';
  const params = [];

  if (cursor) {
    query += ' WHERE id > $1';
    params.push(cursor);
  }

  query += ' ORDER BY id LIMIT $' + (params.length + 1);
  params.push(limit + 1); // Fetch one extra to detect if there are more results

  const results = await db.query(query, params);

  const hasMore = results.rows.length > limit;
  const items = results.rows.slice(0, limit);

  const nextCursor = hasMore
    ? Buffer.from(String(items[items.length - 1].id)).toString('base64')
    : null;

  res.json({
    data: items,
    pagination: {
      nextCursor,
      hasMore
    }
  });
});

Notice we fetch limit + 1 rows. If we get more than limit results, we know there’s a next page. This avoids expensive COUNT(*) queries.

Keyset Pagination for Complex Sorting

Cursor pagination works when you sort by a unique, immutable column (like ID). But what if you sort by price (not unique) or by price then ID?

Keyset pagination uses the entire sort key as the cursor:

GET /api/products?sort=price:asc,id:asc&keyset={"price": 49.99, "id": 12345}

The query becomes:

SELECT * FROM products
WHERE (price, id) > (49.99, 12345)
ORDER BY price ASC, id ASC
LIMIT 20;

This tells the database: “Give me rows where the (price, id) tuple is greater than (49.99, 12345).” For columns with ties in the first sort key, the second key breaks the tie and ensures we get the right “next” set.

Pro tip: Always include a unique column in multi-column keyset pagination. Otherwise, you’ll have gaps or duplicates.

Filtering: Querying Without Scanning Everything

Filtering lets clients specify which records they want. Without it, they’d get all results and filter on the client—wasteful and slow.

Simple syntax uses query parameters:

GET /api/products?color=blue&category=shoes&price_lt=100

The API interprets these as filters. Standard operators include:

OperatorMeaningExample
field=valueEqualscolor=blue
field_ne=valueNot equalscolor_ne=red
field_gt=valueGreater thanprice_gt=50
field_gte=valueGreater than or equalprice_gte=50
field_lt=valueLess thanprice_lt=100
field_lte=valueLess than or equalprice_lte=100
field_in=v1,v2,v3One of multiple valuessize_in=S,M,L
field_like=%pattern%Pattern matchingname_like=%running%
field_between=v1,v2Rangeprice_between=50,100

A more expressive standard is RSQL/FIQL, which reads like natural language:

GET /api/products?filter=color=='blue' and price<100 and (rating>4 or reviews>100)

This filters for blue products under $100 that either have a rating over 4 or more than 100 reviews. Most programming languages have RSQL parsers that convert these strings to SQL safely.

Here’s an example filtering implementation:

function buildFilterSQL(filter) {
  // Parse filter string (simplified; use an RSQL library in production)
  const conditions = [];

  if (filter.color) {
    conditions.push(`color = '${filter.color}'`);
  }
  if (filter.price_lt) {
    conditions.push(`price < ${filter.price_lt}`);
  }
  if (filter.price_gte) {
    conditions.push(`price >= ${filter.price_gte}`);
  }
  if (filter.rating_gt) {
    conditions.push(`rating > ${filter.rating_gt}`);
  }

  return conditions.length > 0 ? 'WHERE ' + conditions.join(' AND ') : '';
}

const whereClause = buildFilterSQL({
  color: 'blue',
  price_lt: 100,
  rating_gt: 4
});
// Result: "WHERE color = 'blue' AND price < 100 AND rating > 4"

Critical safety note: Always parameterize queries. Never concatenate user input directly into SQL strings—that’s a SQL injection vulnerability. Use prepared statements:

const query = `
  SELECT * FROM products
  WHERE color = $1 AND price < $2 AND rating > $3
  LIMIT $4
`;
const results = await db.query(query, ['blue', 100, 4, 20]);

Sorting: Controlling Result Order

Sorting syntax typically looks like:

GET /api/products?sort=price:asc,rating:desc,id:asc

This sorts by price (ascending), then rating (descending), then ID (ascending). Always include a tie-breaker (like ID) to ensure consistent ordering across requests.

Sorting interacts critically with pagination. If you’re using cursor-based pagination, the cursor must include all sort fields. If you sort by (price, rating, id), your cursor must encode all three values.

Database performance depends heavily on indexes. To efficiently sort by price and filter by color:

CREATE INDEX idx_products_color_price ON products(color, price);

This is a covering index if it also includes the columns you SELECT:

CREATE INDEX idx_products_color_price_id_name
  ON products(color, price)
  INCLUDE (id, name);

The database uses the index for filtering and sorting without touching the main table—a huge win for large datasets.

The Interplay: Pagination, Filtering, and Sorting Together

In real systems, these three always happen together. Consider a request:

GET /api/products?color=blue&price_gte=50&price_lt=100&sort=price:asc&limit=20&cursor=abc123

The SQL query must:

  1. Filter: WHERE color = ‘blue’ AND price >= 50 AND price < 100
  2. Sort: ORDER BY price ASC, id ASC
  3. Paginate: WHERE (price, id) > (decoded cursor values) … LIMIT 20

The optimized index must support all three:

CREATE INDEX idx_products_filtering_sorting
  ON products(color, price)
  INCLUDE (id, name, rating);

This index is ordered by (color, price), which supports both the filter and the sort. A query engine can scan this index, skip non-blue products, and walk through prices in ascending order—then use pagination to get the next 20 items.

Without proper index design, you’re doing table scans on millions of rows for every request.

Total Count: The Hidden Expense

A common API response includes a total count:

{
  "data": [...],
  "pagination": {
    "total": 45230,
    "page": 1,
    "limit": 20
  }
}

On a 10-million-row table, COUNT(*) forces the database to scan every row. With filtering, COUNT(*) WHERE color = 'blue' still scans millions of rows. This is expensive.

Alternatives:

  1. hasMore flag instead of total count:

    {
      "data": [...],
      "pagination": {
        "hasMore": true,
        "nextCursor": "xyz789"
      }
    }

    You already know if there’s a next page (you fetched limit + 1 rows). No COUNT needed.

  2. Estimated counts (from database statistics):

    SELECT reltuples FROM pg_class WHERE relname = 'products';

    PostgreSQL’s reltuples gives a rough count without scanning. Fast but approximate.

  3. Cached counts updated asynchronously—good for dashboards where exactness isn’t critical.

For most modern UIs, users don’t need the total count. They navigate via “Next” and “Previous,” not by knowing there are 45,230 results. Skip the expensive COUNT.

Trade-Offs and Comparisons

FeatureOffsetCursorKeyset
Client SimplicityVery simple; page numbers feel naturalOpaque cursor; less intuitiveRequires more complex cursor encoding
Server SimplicityVery simple; standard LIMIT/OFFSETStraightforward; standard comparisonMore complex multi-column comparisons
Performance at ScaleDegrades rapidly (OFFSET scans all skipped rows)Constant time (direct index lookup)Constant time (direct index lookup)
Stable Under MutationsFails; gaps and duplicates with inserts/deletesStable; position-basedStable; position-based
Data Changes FriendlyNoYes; changes don’t affect paginationYes; changes don’t affect pagination
Total Count CheapNo; COUNT is expensive regardlessNo; COUNT is expensiveNo; COUNT is expensive
When to UseSmall datasets, infrequent changesReal-time data, large datasetsMulti-field sorting on large datasets

API Rate Limits and Payload Protection

Even with pagination, protect your API:

// Enforce maximum limit
const limit = Math.min(
  parseInt(req.query.limit) || 20,
  100  // Never allow more than 100 per request
);

// Enforce default filter
if (!req.query.filter && !req.query.color) {
  // Require at least one filter to prevent table scans
  return res.status(400).json({
    error: 'filter parameter required'
  });
}

A user requesting /api/products?limit=10000 shouldn’t crash your server. Always set a maximum. And consider requiring at least one filter on large tables—it prevents accidental table scans.

Key Takeaways

  • Pagination is essential for scale: Offset works for small datasets; cursor or keyset for large ones. Cursor is stable under mutations; offset can produce gaps.
  • Filtering reduces data transfer and processing: Use RSQL or standard operators. Always parameterize queries to prevent injection.
  • Sorting requires index-aware design: Include sort keys in indexes. Multi-field sorting needs tie-breakers for consistency.
  • Combine all three with proper indexes: A (filter_field, sort_field) index with covering columns enables efficient filtering, sorting, and pagination.
  • Avoid expensive counts: Use hasMore flags or estimated counts instead of COUNT(*). Most users don’t need the total.
  • Set maximum limits and enforce defaults: Protect your API from accidental table scans and payload explosions.

Practice Scenarios

Scenario 1: E-Commerce Catalog

You’re building an API for an e-commerce site with 5 million products. Users filter by category, price range, brand, and ratings. They sort by relevance, price, or date added. Design the pagination, filtering, and sorting strategy. What indexes do you create? How do you handle the total product count?

Scenario 2: Real-Time Analytics Dashboard

Your analytics API tracks events in real time. New events are inserted constantly. Clients request events sorted by timestamp and filtered by event type. How do you handle pagination when the dataset is constantly growing? Would you use offset or cursor? Why?

Scenario 3: Search Query Complexity

A client needs to search for products matching complex criteria: (color = ‘blue’ OR color = ‘navy’) AND price between 50 and 200 AND (rating over 4 OR featured = true). Design a filtering syntax. How do you prevent malicious or expensive queries?

Connection to the Next Layer

Pagination, filtering, and sorting are critical for reducing load on your API servers. But even with efficient queries, a single user might hammer your endpoint with hundreds of requests per second, monopolizing resources. This is where rate limiting enters the picture—the next chapter will show how to protect your API from both accidental and malicious overconsumption, ensuring fair resource allocation across all users.