API Gateway Pattern

The Mobile Problem: Why One Gateway Beats Many Calls

Imagine you’re building a mobile app for an e-commerce platform. A user opens the home screen, and you need to display:

Their profile information (from user-service)
Recent orders (from order-service)
Product recommendations (from recommendation-service)
Their loyalty points (from rewards-service)
Current notifications (from notification-service)

Without an API gateway, your mobile client makes five separate HTTP requests over a cellular network. Each request carries overhead—TCP handshake, TLS negotiation, HTTP headers. Each service has a different network address your client must know. Each response comes in a different format requiring different parsing logic. Your app doesn’t know if the user is authenticated to each service, so it passes authentication credentials to every one. By the time all five responses arrive, you’ve drained battery, consumed bandwidth, and the user sees a loading spinner.

Now imagine a gateway between your mobile app and all these services. Your client makes one call to the gateway: “Get me home screen data.” The gateway knows where each service lives (we covered service discovery in the previous section). It calls all five services in parallel, combines their responses into a single optimized JSON structure, and returns it to your client in one response. The gateway handles authentication once—the mobile app proves its identity to the gateway, which then requests data from services as a trusted caller.

This is the API gateway pattern in action.

What Is an API Gateway?

An API gateway is a reverse proxy that sits between clients and your backend services, acting as the single entry point for all client requests. It’s not just a router—it’s a full-featured middleware that handles the concerns that span all your services.

Core Responsibilities:

Request routing: Direct incoming requests to the correct backend service based on URL path, HTTP method, headers, or other criteria
API composition and aggregation: Combine responses from multiple services into a single response for the client
Protocol translation: Convert between protocols—gRPC to REST, SOAP to JSON, HTTP/1.1 to HTTP/2
Authentication and authorization: Verify client identity once at the gateway, enforce permission policies
Rate limiting: Prevent abuse by limiting requests per client, per API key, per time window
Response caching: Store frequently accessed data to reduce downstream load
Request/response transformation: Modify headers, request bodies, or responses to match expected formats
Logging and monitoring: Track all API traffic for analytics, debugging, and compliance

Think of a traditional reverse proxy (like NGINX for web servers) as handling traffic routing. An API gateway does that plus application-level concerns like business logic orchestration.

The Backend for Frontend Pattern

You’ll often hear about BFF (Backend for Frontend) in the context of gateways. Rather than one gateway serving all clients, you deploy separate gateways optimized for different client types:

Mobile gateway: Optimized for limited bandwidth and battery. Aggregates data heavily. Returns minimal response payloads.
Web gateway: Different data shapes optimized for browser clients. May enable more detailed filtering and sorting.
Third-party/partner gateway: Restricted access, strict rate limiting, different API versioning.

Each gateway is a specialized API facade for its client type, reducing the client’s work and tailoring the experience.

Gateway, Load Balancer, Reverse Proxy: What’s the Difference?

These terms overlap and often confuse people:

Load balancer: Distributes incoming traffic across multiple instances of the same service. Operates at the network layer (Layer 4) or application layer (Layer 7). Purpose: availability and performance.
Reverse proxy: Sits in front of backend services and forwards client requests. Could be protecting servers from the internet, caching, or routing. NGINX and HAProxy are popular reverse proxies.
API gateway: A specialized reverse proxy for APIs. Understands HTTP semantics, REST conventions, and provides high-level features like rate limiting per API key, request transformation, and API composition. Purpose: providing a clean API facade.

In practice, you might use a load balancer to spread traffic across multiple API gateway instances, which then route to your services.

How the Gateway Controls the Flow

Let’s visualize the architecture:

graph TB
    Mobile["📱 Mobile Client"]
    Web["🌐 Web Client"]
    Partner["🔌 Partner API"]

    MobileGW["Mobile Gateway"]
    WebGW["Web Gateway"]
    PartnerGW["Partner Gateway"]

    User["User Service"]
    Order["Order Service"]
    Recommendation["Recommendation Service"]
    Inventory["Inventory Service"]

    Mobile --> MobileGW
    Web --> WebGW
    Partner --> PartnerGW

    MobileGW --> User
    MobileGW --> Order
    MobileGW --> Recommendation

    WebGW --> User
    WebGW --> Order
    WebGW --> Inventory

    PartnerGW --> Order
    PartnerGW --> Inventory

    style MobileGW fill:#4A90E2
    style WebGW fill:#4A90E2
    style PartnerGW fill:#4A90E2

Notice that each client type gets its own gateway. The mobile gateway doesn’t expose the inventory service (mobile users don’t need to browse inventory in the same way). The partner gateway only exposes orders and inventory. This is the BFF pattern at work.

Authentication at the Gateway Boundary

One of the most powerful aspects of a gateway: it becomes your authentication boundary. Instead of every service validating credentials, the gateway does it once and then passes authenticated context downstream.

Typical flow:

Client sends request with JWT token to gateway
Gateway validates JWT signature and expiration
If valid, gateway extracts user identity and scopes
Gateway forwards request to backend service, including user identity (typically via a custom header)
Backend service trusts the gateway and uses the identity without re-validating

This pattern eliminates the need for every service to implement authentication logic. You also get a single place to change security policies.

# Kong API Gateway rate limiting + authentication config
api_route:
  routes:
    - name: user-api
      paths:
        - /api/v1/users
      service: user_service
      plugins:
        - name: jwt
          config:
            secret: your-secret-key
            claims_to_verify:
              - exp
        - name: rate-limiting
          config:
            minute: 100
            hour: 10000
            policy: redis

Gateway	Type	Language	Ease of Use	Scalability	Cost
Kong	Open-source	Lua/Nginx	Moderate	Excellent	Free (self-hosted)
AWS API Gateway	Managed	N/A	High	Excellent	Per request
NGINX Plus	Commercial	C	Low	Excellent	License fee
Envoy	Open-source	C++	Low	Excellent	Free (self-hosted)
Spring Cloud Gateway	Open-source	Java	Moderate	Good	Free (self-hosted)
Traefik	Open-source	Go	High	Good	Free (self-hosted)

API Composition: Combining Multiple Services

One of the most useful gateway features is request aggregation. Instead of the client making multiple round trips, the gateway combines them.

Scenario: A mobile app needs user profile, recent orders, and recommendations on one screen.

Without a gateway:

Client -> Service 1: GET /users/123
Client -> Service 2: GET /orders?user_id=123
Client -> Service 3: GET /recommendations?user_id=123

Three requests, three round trips, three latency penalties.

With a gateway and BFF pattern:

Client -> Mobile Gateway: GET /home/profile
Mobile Gateway -> User Service: GET /users/123
Mobile Gateway -> Order Service: GET /orders?user_id=123&limit=5
Mobile Gateway -> Recommendation Service: GET /recommendations?user_id=123&limit=10
Mobile Gateway combines responses and returns:
{
  "profile": { "id": 123, "name": "Alice", ... },
  "recentOrders": [{ "id": 456, "total": 89.99 }, ...],
  "recommendations": [{ "productId": 789, "title": "Laptop Stand" }, ...]
}

The gateway calls three services but the client makes one request. Here’s how you’d implement this in Spring Cloud Gateway:

@Configuration
@EnableWebFlux
public class HomeScreenGateway {

    @Bean
    public RouteLocator routes(RouteLocatorBuilder builder) {
        return builder.routes()
            .route("home-screen", r -> r
                .path("/api/home")
                .filters(f -> f
                    .filter((exchange, chain) -> {
                        String userId = extractUserId(exchange);

                        // Call multiple services
                        Mono<UserResponse> user = getUserService(userId);
                        Mono<OrdersResponse> orders = getOrderService(userId);
                        Mono<RecommendationsResponse> recs = getRecService(userId);

                        // Combine responses
                        return Mono.zip(user, orders, recs)
                            .flatMap(tuple -> {
                                HomeScreenDTO response = new HomeScreenDTO(
                                    tuple.getT1(),
                                    tuple.getT2(),
                                    tuple.getT3()
                                );
                                return chain.filter(exchange);
                            });
                    })
                )
                .uri("http://user-service")
            )
            .build();
    }
}

This pattern is powerful for mobile clients where bandwidth is limited. A web client might make the three calls in parallel from the browser and combine them client-side—different BFF, different optimization strategy.

Cross-Cutting Concerns at the Gateway

Circuit Breaking If a downstream service becomes unhealthy, the gateway can fail fast rather than timing out:

resilience:
  circuitBreaker:
    registerHealthIndicator: true
    slidingWindowSize: 10
    failureRateThreshold: 50
    waitDurationInOpenState: 30000

CORS Handling The gateway centralizes CORS configuration instead of every service implementing it independently.

Request Transformation Convert XML requests to JSON before forwarding to services expecting JSON.

Caching Store responses from frequently-accessed endpoints to reduce downstream load. Cache based on query parameters and user context.

Request/Response Logging Log every API call for audit trails, debugging, and compliance. The gateway sees everything in one place.

Gateway vs Service Mesh: Complementary, Not Competing

A common confusion: “Should we use an API gateway or a service mesh?”

The answer: both. They handle different traffic directions:

API gateway: Handles north-south traffic (external clients to services). Provides public API interface, rate limiting, authentication.
Service mesh (like Istio): Handles east-west traffic (service-to-service). Provides circuit breaking, retries, observability, mTLS between services.

The gateway is your public facade. The service mesh is your internal plumbing.

Real-World Implementation: Mobile BFF with JWT

Here’s a concrete example. Your mobile app needs optimized data, so you create a dedicated mobile gateway:

# mobile-gateway-routes.yml
routes:
  - path: /api/v1/mobile/home
    method: GET
    service: aggregation
    middleware:
      - jwt-auth
      - rate-limit

  - path: /api/v1/mobile/orders
    method: GET
    service: order-service
    middleware:
      - jwt-auth
      - cache-control

jwt-auth:
  secret: ${JWT_SECRET}
  issuer: ${JWT_ISSUER}

rate-limit:
  per-user-per-minute: 100

When a mobile client calls /api/v1/mobile/home:

Gateway validates JWT from request headers
Gateway extracts user ID from JWT payload
Gateway concurrently calls: user-service, order-service (with limit=5), recommendation-service
Gateway combines responses into mobile-optimized payload
Gateway returns single response with all data the home screen needs

The gateway shields the client from knowing about individual services. If you later move the recommendation service from one server to another, only the gateway configuration changes—mobile clients don’t need updates.

Practical Trade-Offs

When to Use an API Gateway:

Multiple client types (mobile, web, partners) with different needs
Need centralized authentication and rate limiting
Services expose different APIs that clients would need to coordinate
Want to hide internal service topology from clients
Need request/response transformation or composition

When a Simple Reverse Proxy Might Be Enough:

Single client type with consistent needs
Services already expose clean, client-ready APIs
Gateway only needs basic routing and load balancing
Team lacks expertise to manage complex gateway configurations

Real Costs of an API Gateway:

Single point of failure: If the gateway goes down, all clients lose access. Mitigation: deploy multiple gateway instances behind a load balancer, ensure high availability and fast failover
Performance bottleneck: Every request passes through the gateway. High-traffic systems need careful capacity planning. Mitigation: horizontal scaling, caching, asynchronous processing where possible
Added latency: Even a fast gateway adds milliseconds to each request. Mitigation: co-locate gateway with services, use high-performance gateways like Envoy
Development bottleneck: If one team owns the gateway, they become a constraint. Mitigation: clear ownership model, standardized gateway configurations, self-service route deployment
Gateway bloat: Temptation to add more logic—validation, business rules, authorization. Keep the gateway focused on cross-cutting concerns; let services handle business logic

Managed vs Self-Hosted:

Managed gateways (AWS API Gateway, Azure API Management) handle availability and patching but cost more and tie you to a cloud provider. Self-hosted gateways (Kong, Envoy) give you control but require ops expertise and infrastructure investment.

Key Takeaways

An API gateway is a reverse proxy that provides a single entry point for all clients, handling routing, authentication, rate limiting, and request aggregation
The Backend for Frontend (BFF) pattern deploys separate gateways optimized for different client types—mobile, web, partners
Gateways become your authentication boundary, validating credentials once and passing identity downstream
Request aggregation at the gateway reduces client burden and optimizes for high-latency networks (like mobile)
An API gateway (north-south traffic) complements but doesn’t replace a service mesh (east-west traffic)
Gateway architecture must be highly available to avoid becoming a single point of failure
Choose a gateway solution based on your traffic scale, team expertise, and deployment model (managed vs self-hosted)

Practice Scenarios

Scenario 1: Mobile App Slow Startup

Your e-commerce mobile app’s home screen takes 8 seconds to load. You’ve traced the issue: the app makes 6 separate API calls sequentially. Your backend services are healthy and responsive individually. Design an API gateway solution using the BFF pattern that would reduce load time. What should the mobile gateway aggregate? How would you prioritize which services to call first?

Scenario 2: Third-Party Partner Integration

A B2B partner needs to integrate with your inventory and order systems, but you don’t want to expose your full internal APIs. Design a separate partner gateway. What endpoints would you expose? How would you rate-limit them differently from internal clients? What authentication approach would you use for programmatic access?

Scenario 3: Migrating from Monolith

You’re splitting a monolithic application into microservices. Currently, mobile clients and web clients use the same REST API. Design how you’d introduce separate gateways for mobile and web while maintaining backward compatibility. How would you phase this migration without breaking existing clients?

Next, we’ll explore one of the most challenging problems in distributed systems: managing distributed data. When your services own separate databases, transactions, consistency, and data synchronization become exponentially harder. The patterns you’ll learn—saga pattern, event sourcing, and distributed tracing—depend on the solid API gateway foundation you’ve just built.

System Design Fundamentals