System Design Fundamentals

GraphQL: When & Why

A

GraphQL: When & Why

The Problem That Started It All

Picture this: It’s 2012, and Facebook’s mobile engineering team faces a mounting frustration. Their native iOS and Android apps connect to REST APIs designed for the web. But mobile clients need different data shapes and volumes than browsers. Requesting a user’s feed using the existing REST endpoint? You get back massive JSON payloads containing every field of every post, every user, every comment—even fields the mobile UI never displays. The bandwidth toll is brutal. And if you need data from three different resources? Three separate HTTP requests, each with network latency. The team called this “over-fetching” (getting too much data) and “under-fetching” (needing multiple requests).

Instead of building yet another REST endpoint variation, Facebook’s teams (Lee Byron, Dan Schafer, and others) designed a radical alternative: a query language that lets clients request exactly the data they need, in exactly the shape they need, in a single round trip. They open-sourced it in 2015 as GraphQL.

This chapter explores when GraphQL shines and when it creates problems. We’ll examine the technology deeply, compare it to REST (Chapter 71) and gRPC (Chapter 72), and build a decision framework for your systems.

What Is GraphQL, Really?

GraphQL is two things simultaneously:

  1. A query language — a syntax that clients use to ask for data
  2. A runtime — server-side code that executes those queries and returns results

Unlike REST, which defines a set of endpoints that return predefined data structures, GraphQL exposes a schema—a formal specification of all the data and operations available. The schema is the contract. It’s simultaneously API documentation, validation rules, and executable specification.

# This is a GraphQL schema
type User {
  id: ID!
  name: String!
  email: String!
  posts: [Post!]!
  friends: [User!]!
}

type Post {
  id: ID!
  title: String!
  content: String!
  author: User!
  comments: [Comment!]!
  createdAt: DateTime!
}

type Comment {
  id: ID!
  text: String!
  author: User!
  post: Post!
}

type Query {
  user(id: ID!): User
  posts(limit: Int = 10): [Post!]!
}

type Mutation {
  createPost(title: String!, content: String!): Post!
  deletePost(id: ID!): Boolean!
}

type Subscription {
  postCreated: Post!
  userOnline(userId: ID!): User!
}

Notice the structure:

  • Types define shapes (User, Post, Comment)
  • Scalar types are primitives (String, Int, Float, Boolean, ID)
  • Fields are named properties with types
  • Exclamation marks mean “non-null” (required)
  • Brackets denote lists
  • Query, Mutation, and Subscription are special types defining what operations clients can invoke

A resolver is a function that fetches data for a field. When a client queries for user.posts, the server runs the resolver for the posts field on the User type, which typically queries the database and returns the results.

The Buffet Analogy

REST is like a fixed-menu restaurant. You order “Dish #5” (GET /users/123), and the kitchen serves you exactly what’s on that plate. If it includes vegetables and you only wanted the protein, too bad. If you need information from two different dishes, you place two separate orders and wait for both. Efficient for standard meals, wasteful for customization.

GraphQL is like a buffet. You walk around and pick exactly what you want—some protein here, certain vegetables, skip the starch. You put together one plate (one request) with precisely what you need in the exact proportions you want. You never waste food (no over-fetching), and you don’t make multiple trips (no under-fetching).

Core Concepts in Depth

The Type System as Documentation

GraphQL’s type system serves triple duty: it’s the API definition, the validation rules, and the documentation all at once. Tools automatically generate interactive documentation (like GraphQL Playground or Apollo Studio) directly from the schema. No separate OpenAPI files to maintain; no documentation drift.

Queries, Mutations, and Subscriptions

  • Queries are reads. They’re idempotent and side-effect free.
  • Mutations are writes. They’re how clients request state changes.
  • Subscriptions push real-time data. Clients open a persistent connection (usually WebSocket) and receive updates when data changes.

Here’s what a client query looks like:

query GetUserWithPosts($userId: ID!) {
  user(id: $userId) {
    id
    name
    email
    posts(limit: 5) {
      id
      title
      createdAt
      comments(limit: 3) {
        id
        text
        author {
          name
        }
      }
    }
  }
}

Notice:

  • Aliases and nesting — follow relationships across the graph
  • Variables (prefixed with $) — parameterize the query safely
  • Arguments — specify filters, limits, and options at each level
  • Single round trip — the server returns all nested data in one response

Resolvers: The Machinery

Resolvers are functions that populate field values. Here’s a JavaScript example:

const resolvers = {
  Query: {
    user: async (parent, args, context, info) => {
      // args.id is the ID passed by the client
      return await db.users.findById(args.id);
    }
  },
  User: {
    posts: async (parent, args, context, info) => {
      // parent is the User object
      // Fetch posts for this user
      return await db.posts.findByAuthorId(parent.id);
    },
    friends: async (parent, args, context, info) => {
      return await db.users.findFriends(parent.id);
    }
  },
  Post: {
    author: async (parent, args, context, info) => {
      // parent is the Post object
      return await db.users.findById(parent.authorId);
    },
    comments: async (parent, args, context, info) => {
      return await db.comments.findByPostId(parent.id);
    }
  }
};

Each resolver receives four arguments:

  1. parent — the object containing this field
  2. args — arguments passed by the client
  3. context — shared data (database connection, auth user, etc.)
  4. info — metadata about the query execution

The N+1 Problem and DataLoader

Resolvers are elegant but dangerous. Consider the earlier query that fetches a user, their posts, and each post’s author. Here’s what happens:

  1. Fetch user (1 query)
  2. For each post, fetch its comments (N queries, where N = number of posts)
  3. For each comment, fetch its author (M queries, where M = number of comments)

If a user has 5 posts with 3 comments each, you’ve made 1 + 5 + 15 = 21 database queries for what should be a single logical request. This is the N+1 problem.

Enter DataLoader, a batching and caching utility:

import DataLoader from 'dataloader';

const userLoader = new DataLoader(async (userIds) => {
  // Fetch all users in one query
  const users = await db.users.findByIds(userIds);
  // Return in the same order as requested
  return userIds.map(id => users.find(u => u.id === id));
});

const resolvers = {
  Post: {
    author: async (parent, args, context, info) => {
      // Instead of db.users.findById(parent.authorId),
      // we batch the request
      return userLoader.load(parent.authorId);
    }
  }
};

// For each tick of the event loop, all pending loads are batched
// into a single database query: SELECT * FROM users WHERE id IN (...)

DataLoader coalesces multiple field resolver calls into a single batch query. Whether the query requests 1 post or 100, fetching authors happens in a single database query. This transforms the N+1 problem into a manageable “N batches + caching” pattern.

Pagination: Cursors and Offsets

GraphQL doesn’t mandate a pagination approach, but cursor-based pagination is preferred for distributed systems:

type PostConnection {
  edges: [PostEdge!]!
  pageInfo: PageInfo!
}

type PostEdge {
  cursor: String!
  node: Post!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

type Query {
  userPosts(userId: ID!, first: Int, after: String): PostConnection!
}

A client query looks like:

query {
  userPosts(userId: "123", first: 10, after: "cursor_xyz") {
    edges {
      cursor
      node {
        id
        title
      }
    }
    pageInfo {
      hasNextPage
      endCursor
    }
  }
}

Cursors are opaque tokens (often base64-encoded) that the server creates and interprets. They’re resilient to data changes and don’t require knowing the total count—perfect for evolving datasets.

Introspection and Tooling

GraphQL’s introspection system lets clients query the schema itself:

query {
  __type(name: "User") {
    name
    fields {
      name
      type {
        name
        kind
      }
    }
  }
}

This enables powerful developer tools: IDE autocomplete, API explorers, mock servers, and automatic client code generation.

Real-World Example: A Social Media Schema

Let’s design a complete schema:

scalar DateTime

type User {
  id: ID!
  username: String!
  email: String!
  bio: String
  avatar: String
  createdAt: DateTime!
  posts(limit: Int = 10): [Post!]!
  followers: [User!]!
  following: [User!]!
}

type Post {
  id: ID!
  title: String!
  content: String!
  author: User!
  likes: Int!
  comments(limit: Int = 5): [Comment!]!
  createdAt: DateTime!
  updatedAt: DateTime!
}

type Comment {
  id: ID!
  text: String!
  author: User!
  post: Post!
  likes: Int!
  createdAt: DateTime!
}

type Query {
  user(id: ID!): User
  me: User
  posts(limit: Int = 20, offset: Int = 0): [Post!]!
  searchUsers(query: String!): [User!]!
}

type Mutation {
  createPost(title: String!, content: String!): Post!
  updatePost(id: ID!, title: String, content: String): Post
  deletePost(id: ID!): Boolean!
  createComment(postId: ID!, text: String!): Comment!
  likePost(postId: ID!): Post!
}

type Subscription {
  postCreated: Post!
  commentAdded(postId: ID!): Comment!
}

A complex nested query:

query FeedWithComments($limit: Int = 5) {
  me {
    id
    username
    following {
      id
      posts(limit: 3) {
        id
        title
        content
        author {
          username
          avatar
        }
        comments(limit: $limit) {
          text
          author {
            username
          }
        }
      }
    }
  }
}

The server executes this by:

  1. Resolving me (the authenticated user)
  2. Resolving their following relationship
  3. For each followed user, resolving their posts
  4. For each post, resolving the author and comments
  5. For each comment, resolving its author
  6. Returning the entire tree in a single JSON response

Caching: GraphQL’s Achilles’ Heel

REST benefits from HTTP caching headers (Cache-Control, ETags). A GET request to /users/123 can be cached by CDNs, browsers, and proxies. Every subsequent request gets the cached response.

GraphQL breaks this. All requests are POST to /graphql. CDNs can’t cache POST responses. Even GET-based GraphQL (uncommon) can’t exploit query-level caching because the query body varies.

Solutions:

  1. Client-side caching — Libraries like Apollo Client and Relay normalize query responses and cache them locally, invalidating when mutations occur.

  2. Persisted queries — Pre-register queries with the server, then send only a query ID instead of the full query text. This enables caching on the server side and reduces bandwidth.

  3. HTTP caching for queries — Treat GraphQL endpoints as cacheable if you use GET and enforce strong constraints (idempotency, read-only queries).

  4. Custom caching layers — Implement Redis-based caching of resolver results, keyed by query hash and variables.

Pro tip: Most production GraphQL servers implement a combination of these—client-side caching for immediate responsiveness, persisted queries for bandwidth reduction, and resolver-level caching for database relief.

Subscriptions and Real-Time Data

GraphQL subscriptions enable pushing data to clients via WebSocket:

subscription OnPostCreated {
  postCreated {
    id
    title
    author {
      username
    }
  }
}

When a mutation creates a post, the server pushes the new post to all subscribed clients. Implementation requires:

  • WebSocket server (e.g., Apollo Server with @apollo/server)
  • Pub/Sub system to broadcast events (Redis, Kafka, or in-memory)
  • Subscription resolvers that return async iterables

Subscriptions are powerful but resource-intensive. Each subscription maintains an open connection and consumes server memory. At scale, you need careful connection pooling, backpressure handling, and rate limiting.

Federation: Composing Multiple GraphQL Services

Apollo Federation lets you build a single, cohesive GraphQL API from multiple independently deployed services:

# Users service
type User @key(fields: "id") {
  id: ID!
  name: String!
}

# Posts service
extend type User @key(fields: "id") {
  id: ID!
  posts: [Post!]!
}

type Post @key(fields: "id") {
  id: ID!
  title: String!
  author: User!
}

The Apollo Gateway composes these schemas at runtime, transparently resolving cross-service references. When a client queries a user’s posts, the gateway intelligently routes parts of the query to the appropriate service.

This pattern scales to dozens of services while maintaining a unified GraphQL schema, enabling independent deployment and iteration.

Comparing GraphQL, REST, and gRPC

AspectRESTGraphQLgRPC
Request ShapeFixed by serverSpecified by clientFixed by server
CachingHTTP-level (excellent)Complex (manual)Complex (manual)
Over-fetchingCommonPrevented by designPrevented by design
Under-fetchingCommon (multiple requests)Single requestSingle request
ToolingGood (Swagger/OpenAPI)Excellent (introspection, playgrounds)Good (protoc, IDE plugins)
Type SafetyWeak (JSON schema)Strong (built-in types)Very strong (protobuf)
Learning CurveShallowModerateModerate
PerformanceFast (simple payloads)Depends on query (can be slow with bad queries)Very fast (binary protocol)
ComplexitySimple serversComplex resolvers (N+1 risk)Moderate (clear contracts)
Browser FriendlyExcellentGood (JSON)Poor (binary)
File UploadsSimple (multipart/form-data)Awkward (non-standard)Awkward (streaming)
Rate LimitingEasy (by endpoint)Hard (all requests same endpoint)Moderate (by service)

When GraphQL Excels

  • Multiple clients with different needs — Web, mobile, IoT devices need different data shapes. GraphQL’s flexibility shines.
  • Complex, relational data — Social networks, e-commerce platforms, content management systems benefit from graph traversal.
  • Rapid iteration — Adding fields to the schema doesn’t break existing clients. You avoid version proliferation.
  • Developer experience — Introspection, playgrounds, and automatic documentation reduce friction.
  • Aggregating data from multiple sources — Federation composes microservices seamlessly.

When GraphQL Creates Problems

  • Simple CRUD operations — If your API is just “list users,” “get user,” “create user,” “update user,” REST is simpler and faster.
  • File-heavy operations — Uploading/downloading large files via GraphQL is awkward. Consider REST for media endpoints.
  • Real-time streaming at extreme scale — Subscriptions don’t scale as well as message queues (Kafka, RabbitMQ) for high-volume events.
  • Cache-dependent systems — If you rely heavily on HTTP caching, GraphQL’s POST-based model is problematic.
  • Public APIs with rate limiting — Enforcing fair rate limits is harder when all queries hit the same endpoint. You must analyze query complexity.

Security Considerations

GraphQL APIs require special attention:

Query depth limiting — Prevent arbitrarily nested queries that cause server overload.

# Dangerous query
query {
  user(id: "1") {
    posts {
      comments {
        author {
          posts {
            comments {
              author {
                # ... infinite nesting
              }
            }
          }
        }
      }
    }
  }
}

Query complexity analysis — Assign a “cost” to each field based on the work required. Reject queries exceeding a threshold.

const complexityEstimator = (estimators) => ({
  User: {
    posts: () => 5,      // Fetching posts costs 5 units
    friends: () => 10,   // Fetching friends costs 10 units
  },
  Post: {
    comments: () => 3,
  }
});

Persisted queries — Only allow pre-registered queries in production. This prevents attackers from crafting expensive queries.

Authentication and authorization — Implement field-level access control. Don’t rely on query structure; protect sensitive data at the resolver level.

const resolvers = {
  User: {
    email: (parent, args, context) => {
      if (context.userId !== parent.id && !context.isAdmin) {
        throw new AuthenticationError('Unauthorized');
      }
      return parent.email;
    }
  }
};

Key Takeaways

  • GraphQL solves over-fetching and under-fetching by letting clients request exactly the data they need in a single request.
  • The type system is both contract and documentation; the schema is the truth.
  • Resolvers are elegant but introduce the N+1 problem; use DataLoader for batching.
  • Caching is GraphQL’s greatest challenge; combine client-side caching, persisted queries, and resolver-level strategies.
  • Federation enables composing multiple services into a unified API.
  • Choose GraphQL for complex, relational data and multiple clients; choose REST for simple CRUD; choose gRPC for high-performance microservices.

Practice Scenarios

Scenario 1: E-Commerce Platform You’re building an e-commerce system with a mobile app, web storefront, and admin dashboard. Each client needs different data (mobile needs lightweight product summaries; web needs full descriptions and recommendations; admin needs inventory and sales metrics). Design a GraphQL schema that serves all three without duplication. How do you prevent the N+1 problem when fetching products with their reviews and reviewer details?

Scenario 2: Real-Time Collaboration Tool You’re building a document collaboration platform (think Google Docs). Clients need to receive real-time updates when other users edit, comment, or change permissions. Design the subscription model. How do you scale subscriptions to 10,000 concurrent editors without overwhelming the server?

Scenario 3: Microservices Federation Your company has split into multiple teams: Users service (manages authentication and profiles), Posts service (manages content), and Recommendations service (calculates personalized recommendations). Design how you’d use Apollo Federation to present a unified GraphQL API while letting each service operate independently. How do you handle cross-service references and authorization?


Next Steps: API Versioning

GraphQL’s type system eliminates many versioning headaches—you rarely break clients by adding fields. But how do you deprecate old fields gracefully? How do you handle schema evolution as your system grows? In Chapter 74, we’ll explore API versioning strategies that work with GraphQL, REST, and gRPC, and how to plan for evolution without disruption.