GraphQL: When & Why
The Problem That Started It All
Picture this: It’s 2012, and Facebook’s mobile engineering team faces a mounting frustration. Their native iOS and Android apps connect to REST APIs designed for the web. But mobile clients need different data shapes and volumes than browsers. Requesting a user’s feed using the existing REST endpoint? You get back massive JSON payloads containing every field of every post, every user, every comment—even fields the mobile UI never displays. The bandwidth toll is brutal. And if you need data from three different resources? Three separate HTTP requests, each with network latency. The team called this “over-fetching” (getting too much data) and “under-fetching” (needing multiple requests).
Instead of building yet another REST endpoint variation, Facebook’s teams (Lee Byron, Dan Schafer, and others) designed a radical alternative: a query language that lets clients request exactly the data they need, in exactly the shape they need, in a single round trip. They open-sourced it in 2015 as GraphQL.
This chapter explores when GraphQL shines and when it creates problems. We’ll examine the technology deeply, compare it to REST (Chapter 71) and gRPC (Chapter 72), and build a decision framework for your systems.
What Is GraphQL, Really?
GraphQL is two things simultaneously:
- A query language — a syntax that clients use to ask for data
- A runtime — server-side code that executes those queries and returns results
Unlike REST, which defines a set of endpoints that return predefined data structures, GraphQL exposes a schema—a formal specification of all the data and operations available. The schema is the contract. It’s simultaneously API documentation, validation rules, and executable specification.
# This is a GraphQL schema
type User {
id: ID!
name: String!
email: String!
posts: [Post!]!
friends: [User!]!
}
type Post {
id: ID!
title: String!
content: String!
author: User!
comments: [Comment!]!
createdAt: DateTime!
}
type Comment {
id: ID!
text: String!
author: User!
post: Post!
}
type Query {
user(id: ID!): User
posts(limit: Int = 10): [Post!]!
}
type Mutation {
createPost(title: String!, content: String!): Post!
deletePost(id: ID!): Boolean!
}
type Subscription {
postCreated: Post!
userOnline(userId: ID!): User!
}
Notice the structure:
- Types define shapes (User, Post, Comment)
- Scalar types are primitives (String, Int, Float, Boolean, ID)
- Fields are named properties with types
- Exclamation marks mean “non-null” (required)
- Brackets denote lists
- Query, Mutation, and Subscription are special types defining what operations clients can invoke
A resolver is a function that fetches data for a field. When a client queries for user.posts, the server runs the resolver for the posts field on the User type, which typically queries the database and returns the results.
The Buffet Analogy
REST is like a fixed-menu restaurant. You order “Dish #5” (GET /users/123), and the kitchen serves you exactly what’s on that plate. If it includes vegetables and you only wanted the protein, too bad. If you need information from two different dishes, you place two separate orders and wait for both. Efficient for standard meals, wasteful for customization.
GraphQL is like a buffet. You walk around and pick exactly what you want—some protein here, certain vegetables, skip the starch. You put together one plate (one request) with precisely what you need in the exact proportions you want. You never waste food (no over-fetching), and you don’t make multiple trips (no under-fetching).
Core Concepts in Depth
The Type System as Documentation
GraphQL’s type system serves triple duty: it’s the API definition, the validation rules, and the documentation all at once. Tools automatically generate interactive documentation (like GraphQL Playground or Apollo Studio) directly from the schema. No separate OpenAPI files to maintain; no documentation drift.
Queries, Mutations, and Subscriptions
- Queries are reads. They’re idempotent and side-effect free.
- Mutations are writes. They’re how clients request state changes.
- Subscriptions push real-time data. Clients open a persistent connection (usually WebSocket) and receive updates when data changes.
Here’s what a client query looks like:
query GetUserWithPosts($userId: ID!) {
user(id: $userId) {
id
name
email
posts(limit: 5) {
id
title
createdAt
comments(limit: 3) {
id
text
author {
name
}
}
}
}
}
Notice:
- Aliases and nesting — follow relationships across the graph
- Variables (prefixed with
$) — parameterize the query safely - Arguments — specify filters, limits, and options at each level
- Single round trip — the server returns all nested data in one response
Resolvers: The Machinery
Resolvers are functions that populate field values. Here’s a JavaScript example:
const resolvers = {
Query: {
user: async (parent, args, context, info) => {
// args.id is the ID passed by the client
return await db.users.findById(args.id);
}
},
User: {
posts: async (parent, args, context, info) => {
// parent is the User object
// Fetch posts for this user
return await db.posts.findByAuthorId(parent.id);
},
friends: async (parent, args, context, info) => {
return await db.users.findFriends(parent.id);
}
},
Post: {
author: async (parent, args, context, info) => {
// parent is the Post object
return await db.users.findById(parent.authorId);
},
comments: async (parent, args, context, info) => {
return await db.comments.findByPostId(parent.id);
}
}
};
Each resolver receives four arguments:
- parent — the object containing this field
- args — arguments passed by the client
- context — shared data (database connection, auth user, etc.)
- info — metadata about the query execution
The N+1 Problem and DataLoader
Resolvers are elegant but dangerous. Consider the earlier query that fetches a user, their posts, and each post’s author. Here’s what happens:
- Fetch user (1 query)
- For each post, fetch its comments (N queries, where N = number of posts)
- For each comment, fetch its author (M queries, where M = number of comments)
If a user has 5 posts with 3 comments each, you’ve made 1 + 5 + 15 = 21 database queries for what should be a single logical request. This is the N+1 problem.
Enter DataLoader, a batching and caching utility:
import DataLoader from 'dataloader';
const userLoader = new DataLoader(async (userIds) => {
// Fetch all users in one query
const users = await db.users.findByIds(userIds);
// Return in the same order as requested
return userIds.map(id => users.find(u => u.id === id));
});
const resolvers = {
Post: {
author: async (parent, args, context, info) => {
// Instead of db.users.findById(parent.authorId),
// we batch the request
return userLoader.load(parent.authorId);
}
}
};
// For each tick of the event loop, all pending loads are batched
// into a single database query: SELECT * FROM users WHERE id IN (...)
DataLoader coalesces multiple field resolver calls into a single batch query. Whether the query requests 1 post or 100, fetching authors happens in a single database query. This transforms the N+1 problem into a manageable “N batches + caching” pattern.
Pagination: Cursors and Offsets
GraphQL doesn’t mandate a pagination approach, but cursor-based pagination is preferred for distributed systems:
type PostConnection {
edges: [PostEdge!]!
pageInfo: PageInfo!
}
type PostEdge {
cursor: String!
node: Post!
}
type PageInfo {
hasNextPage: Boolean!
hasPreviousPage: Boolean!
startCursor: String
endCursor: String
}
type Query {
userPosts(userId: ID!, first: Int, after: String): PostConnection!
}
A client query looks like:
query {
userPosts(userId: "123", first: 10, after: "cursor_xyz") {
edges {
cursor
node {
id
title
}
}
pageInfo {
hasNextPage
endCursor
}
}
}
Cursors are opaque tokens (often base64-encoded) that the server creates and interprets. They’re resilient to data changes and don’t require knowing the total count—perfect for evolving datasets.
Introspection and Tooling
GraphQL’s introspection system lets clients query the schema itself:
query {
__type(name: "User") {
name
fields {
name
type {
name
kind
}
}
}
}
This enables powerful developer tools: IDE autocomplete, API explorers, mock servers, and automatic client code generation.
Real-World Example: A Social Media Schema
Let’s design a complete schema:
scalar DateTime
type User {
id: ID!
username: String!
email: String!
bio: String
avatar: String
createdAt: DateTime!
posts(limit: Int = 10): [Post!]!
followers: [User!]!
following: [User!]!
}
type Post {
id: ID!
title: String!
content: String!
author: User!
likes: Int!
comments(limit: Int = 5): [Comment!]!
createdAt: DateTime!
updatedAt: DateTime!
}
type Comment {
id: ID!
text: String!
author: User!
post: Post!
likes: Int!
createdAt: DateTime!
}
type Query {
user(id: ID!): User
me: User
posts(limit: Int = 20, offset: Int = 0): [Post!]!
searchUsers(query: String!): [User!]!
}
type Mutation {
createPost(title: String!, content: String!): Post!
updatePost(id: ID!, title: String, content: String): Post
deletePost(id: ID!): Boolean!
createComment(postId: ID!, text: String!): Comment!
likePost(postId: ID!): Post!
}
type Subscription {
postCreated: Post!
commentAdded(postId: ID!): Comment!
}
A complex nested query:
query FeedWithComments($limit: Int = 5) {
me {
id
username
following {
id
posts(limit: 3) {
id
title
content
author {
username
avatar
}
comments(limit: $limit) {
text
author {
username
}
}
}
}
}
}
The server executes this by:
- Resolving
me(the authenticated user) - Resolving their
followingrelationship - For each followed user, resolving their
posts - For each post, resolving the
authorandcomments - For each comment, resolving its
author - Returning the entire tree in a single JSON response
Caching: GraphQL’s Achilles’ Heel
REST benefits from HTTP caching headers (Cache-Control, ETags). A GET request to /users/123 can be cached by CDNs, browsers, and proxies. Every subsequent request gets the cached response.
GraphQL breaks this. All requests are POST to /graphql. CDNs can’t cache POST responses. Even GET-based GraphQL (uncommon) can’t exploit query-level caching because the query body varies.
Solutions:
-
Client-side caching — Libraries like Apollo Client and Relay normalize query responses and cache them locally, invalidating when mutations occur.
-
Persisted queries — Pre-register queries with the server, then send only a query ID instead of the full query text. This enables caching on the server side and reduces bandwidth.
-
HTTP caching for queries — Treat GraphQL endpoints as cacheable if you use GET and enforce strong constraints (idempotency, read-only queries).
-
Custom caching layers — Implement Redis-based caching of resolver results, keyed by query hash and variables.
Pro tip: Most production GraphQL servers implement a combination of these—client-side caching for immediate responsiveness, persisted queries for bandwidth reduction, and resolver-level caching for database relief.
Subscriptions and Real-Time Data
GraphQL subscriptions enable pushing data to clients via WebSocket:
subscription OnPostCreated {
postCreated {
id
title
author {
username
}
}
}
When a mutation creates a post, the server pushes the new post to all subscribed clients. Implementation requires:
- WebSocket server (e.g., Apollo Server with
@apollo/server) - Pub/Sub system to broadcast events (Redis, Kafka, or in-memory)
- Subscription resolvers that return async iterables
Subscriptions are powerful but resource-intensive. Each subscription maintains an open connection and consumes server memory. At scale, you need careful connection pooling, backpressure handling, and rate limiting.
Federation: Composing Multiple GraphQL Services
Apollo Federation lets you build a single, cohesive GraphQL API from multiple independently deployed services:
# Users service
type User @key(fields: "id") {
id: ID!
name: String!
}
# Posts service
extend type User @key(fields: "id") {
id: ID!
posts: [Post!]!
}
type Post @key(fields: "id") {
id: ID!
title: String!
author: User!
}
The Apollo Gateway composes these schemas at runtime, transparently resolving cross-service references. When a client queries a user’s posts, the gateway intelligently routes parts of the query to the appropriate service.
This pattern scales to dozens of services while maintaining a unified GraphQL schema, enabling independent deployment and iteration.
Comparing GraphQL, REST, and gRPC
| Aspect | REST | GraphQL | gRPC |
|---|---|---|---|
| Request Shape | Fixed by server | Specified by client | Fixed by server |
| Caching | HTTP-level (excellent) | Complex (manual) | Complex (manual) |
| Over-fetching | Common | Prevented by design | Prevented by design |
| Under-fetching | Common (multiple requests) | Single request | Single request |
| Tooling | Good (Swagger/OpenAPI) | Excellent (introspection, playgrounds) | Good (protoc, IDE plugins) |
| Type Safety | Weak (JSON schema) | Strong (built-in types) | Very strong (protobuf) |
| Learning Curve | Shallow | Moderate | Moderate |
| Performance | Fast (simple payloads) | Depends on query (can be slow with bad queries) | Very fast (binary protocol) |
| Complexity | Simple servers | Complex resolvers (N+1 risk) | Moderate (clear contracts) |
| Browser Friendly | Excellent | Good (JSON) | Poor (binary) |
| File Uploads | Simple (multipart/form-data) | Awkward (non-standard) | Awkward (streaming) |
| Rate Limiting | Easy (by endpoint) | Hard (all requests same endpoint) | Moderate (by service) |
When GraphQL Excels
- Multiple clients with different needs — Web, mobile, IoT devices need different data shapes. GraphQL’s flexibility shines.
- Complex, relational data — Social networks, e-commerce platforms, content management systems benefit from graph traversal.
- Rapid iteration — Adding fields to the schema doesn’t break existing clients. You avoid version proliferation.
- Developer experience — Introspection, playgrounds, and automatic documentation reduce friction.
- Aggregating data from multiple sources — Federation composes microservices seamlessly.
When GraphQL Creates Problems
- Simple CRUD operations — If your API is just “list users,” “get user,” “create user,” “update user,” REST is simpler and faster.
- File-heavy operations — Uploading/downloading large files via GraphQL is awkward. Consider REST for media endpoints.
- Real-time streaming at extreme scale — Subscriptions don’t scale as well as message queues (Kafka, RabbitMQ) for high-volume events.
- Cache-dependent systems — If you rely heavily on HTTP caching, GraphQL’s POST-based model is problematic.
- Public APIs with rate limiting — Enforcing fair rate limits is harder when all queries hit the same endpoint. You must analyze query complexity.
Security Considerations
GraphQL APIs require special attention:
Query depth limiting — Prevent arbitrarily nested queries that cause server overload.
# Dangerous query
query {
user(id: "1") {
posts {
comments {
author {
posts {
comments {
author {
# ... infinite nesting
}
}
}
}
}
}
}
}
Query complexity analysis — Assign a “cost” to each field based on the work required. Reject queries exceeding a threshold.
const complexityEstimator = (estimators) => ({
User: {
posts: () => 5, // Fetching posts costs 5 units
friends: () => 10, // Fetching friends costs 10 units
},
Post: {
comments: () => 3,
}
});
Persisted queries — Only allow pre-registered queries in production. This prevents attackers from crafting expensive queries.
Authentication and authorization — Implement field-level access control. Don’t rely on query structure; protect sensitive data at the resolver level.
const resolvers = {
User: {
email: (parent, args, context) => {
if (context.userId !== parent.id && !context.isAdmin) {
throw new AuthenticationError('Unauthorized');
}
return parent.email;
}
}
};
Key Takeaways
- GraphQL solves over-fetching and under-fetching by letting clients request exactly the data they need in a single request.
- The type system is both contract and documentation; the schema is the truth.
- Resolvers are elegant but introduce the N+1 problem; use DataLoader for batching.
- Caching is GraphQL’s greatest challenge; combine client-side caching, persisted queries, and resolver-level strategies.
- Federation enables composing multiple services into a unified API.
- Choose GraphQL for complex, relational data and multiple clients; choose REST for simple CRUD; choose gRPC for high-performance microservices.
Practice Scenarios
Scenario 1: E-Commerce Platform You’re building an e-commerce system with a mobile app, web storefront, and admin dashboard. Each client needs different data (mobile needs lightweight product summaries; web needs full descriptions and recommendations; admin needs inventory and sales metrics). Design a GraphQL schema that serves all three without duplication. How do you prevent the N+1 problem when fetching products with their reviews and reviewer details?
Scenario 2: Real-Time Collaboration Tool You’re building a document collaboration platform (think Google Docs). Clients need to receive real-time updates when other users edit, comment, or change permissions. Design the subscription model. How do you scale subscriptions to 10,000 concurrent editors without overwhelming the server?
Scenario 3: Microservices Federation Your company has split into multiple teams: Users service (manages authentication and profiles), Posts service (manages content), and Recommendations service (calculates personalized recommendations). Design how you’d use Apollo Federation to present a unified GraphQL API while letting each service operate independently. How do you handle cross-service references and authorization?
Next Steps: API Versioning
GraphQL’s type system eliminates many versioning headaches—you rarely break clients by adding fields. But how do you deprecate old fields gracefully? How do you handle schema evolution as your system grows? In Chapter 74, we’ll explore API versioning strategies that work with GraphQL, REST, and gRPC, and how to plan for evolution without disruption.