API gateways aren't just reverse proxies that forward traffic. The real value of a gateway comes from the architectural patterns it enables — patterns that simplify your clients, protect your backends, and give you a single control plane for cross-cutting concerns. Whether you're building a microservices architecture from scratch or decomposing a monolith, understanding these patterns helps you choose the right gateway and use it effectively.
This guide covers seven API gateway patterns that solve real architectural problems, when to use each one, and how to implement them with a programmable gateway.
Why API Gateway Patterns Matter
Every microservices architecture eventually runs into the same set of problems: clients that need data from multiple services, cross-cutting concerns duplicated across teams, routing logic scattered throughout the stack, and backend services exposed to traffic they shouldn't have to handle.
API gateway patterns are the architectural building blocks that solve these problems. They're not theoretical — they're battle-tested approaches used by teams at every scale. The key is knowing which pattern fits your situation and how your gateway supports it.
The patterns in this guide aren't mutually exclusive. Most production architectures combine several of them. A single gateway might implement the BFF pattern for client-specific APIs, offload authentication and rate limiting from backend services, and apply request transformations on every route.
Backend for Frontend (BFF)
The Backend for Frontend pattern creates dedicated API layers tailored to specific client types — web, mobile, IoT, or third-party consumers. Instead of forcing every client to interact with the same generic API, each frontend gets a backend that speaks its language.
The Problem BFF Solves
A mobile app and a web dashboard consume the same underlying data, but they need it in fundamentally different shapes. The mobile app needs a compact payload optimized for bandwidth. The dashboard needs rich, nested data for complex UI components. A generic API either over-fetches for mobile or under-fetches for web, creating a lose-lose situation.
Without BFF, you end up with one of two bad outcomes:
- Fat API responses that include everything any client might need, wasting bandwidth and processing time for clients that only use a fraction of the data
- Chatty clients that make dozens of API calls to assemble the data they need, increasing latency and complexity on the client side
How BFF Works
Each client type gets its own gateway layer (or set of routes within a gateway) that handles:
- Data aggregation — combining responses from multiple backend services into a single, client-optimized payload
- Response shaping — transforming data into the exact structure the client expects
- Client-specific logic — handling concerns like pagination strategies, field selection, or media format negotiation that differ by client type
In a programmable gateway like Zuplo, you implement the BFF pattern using custom handlers. Each client type gets its own set of routes with handlers that aggregate and shape data from your backend services:
When to Use BFF
BFF makes sense when you have multiple client types with meaningfully different data requirements. If your web and mobile apps consume the same API shape with minimal differences, a shared API with response transformation is simpler.
BFF is also a natural fit when separate teams own each frontend — the team that builds the mobile app also owns the mobile BFF, keeping the API contract close to the team that consumes it.
For a deeper comparison of how API orchestration and aggregation play into the BFF pattern, see how API orchestration differs from API aggregation.
API Composition and Aggregation
API composition combines data from multiple backend services into a single response, reducing the number of round trips a client has to make. While BFF is about tailoring APIs per client, composition is about assembling data from distributed services regardless of who's consuming it.
The Problem Composition Solves
In a microservices architecture, the data a client needs for a single view often lives across several services. A product detail page might need data from a product catalog service, a pricing service, an inventory service, and a reviews service. Without composition at the gateway, the client has to make four separate API calls and stitch the results together.
This creates problems: higher latency from sequential calls, complex error-handling logic on the client, and tight coupling between the client and your internal service topology.
How Composition Works
The gateway acts as an orchestration layer. It receives a single request, fans out to multiple backend services (ideally in parallel), and merges the results into a unified response.
In Zuplo, you can use
context.invokeRoute() to call other
routes within the same gateway without an additional network hop, or use
standard fetch() calls to reach external services:
When to Use Composition
Composition works best when the data a client needs for a single view is spread across two to five services. Beyond that, you're building an orchestration service, not a gateway pattern — and that logic is usually better placed in a dedicated service layer.
Composition at the gateway is particularly valuable when your clients are mobile apps or browser-based SPAs where network latency and the number of round trips directly impact user experience.
Gateway Offloading
Gateway offloading moves cross-cutting concerns — authentication, rate limiting, request validation, logging, CORS, and caching — out of your backend services and into the gateway. Instead of every service implementing its own auth middleware and rate limiter, the gateway handles it once, consistently, before traffic reaches your backends.
The Problem Offloading Solves
Without gateway offloading, every backend team implements their own version of the same cross-cutting concerns. One team uses JWT validation with a 5-minute clock skew tolerance. Another uses 30 seconds. A third team skips validation entirely because "it's an internal service." Rate limiting is inconsistent. Logging formats differ. Security gaps emerge at the seams.
What to Offload
The most common concerns to offload to the gateway:
Authentication and authorization — Verify API keys, validate JWTs, check scopes and roles. Zuplo provides built-in authentication policies covering API keys, JWT (with OpenID Connect), and integrations with providers like Auth0, Clerk, Cognito, and Supabase. Your backend services receive pre-authenticated requests with the user's identity already resolved.
Rate limiting — Protect backends from traffic spikes and abuse. Zuplo's rate limiting policy supports limiting by IP, user, API key, or custom functions. For advanced use cases like usage-based billing, complex rate limiting supports multiple named limits and dynamic increments based on request or response data.
Request validation — Reject malformed requests before they reach your
services. Zuplo's
request validation policy
automatically validates request bodies, query parameters, path parameters, and
headers against your OpenAPI schema definitions. Invalid requests get a 400
response at the gateway — your services never see them.
Caching — Serve repeated requests from cache without hitting your backend. Zuplo's caching policy handles TTL configuration, cache key customization, and cache busting, reducing backend load for frequently accessed data.
Why Offloading Works at the Gateway
The gateway is the natural place for cross-cutting concerns because it sees every request. When you enforce authentication at the gateway, there's no path to your backend that bypasses it. When you rate-limit at the gateway, abusive traffic is blocked before it consumes backend resources.
This is especially powerful with an edge-native gateway. In Zuplo's architecture, authentication, rate limiting, and request validation all execute at the nearest edge location — within milliseconds of the user. Invalid or rate-limited requests are rejected at the edge and never reach your origin servers.
For a broader look at how gateways fit into your infrastructure alongside load balancers, see API gateways vs. load balancers.
Gateway Routing
Gateway routing directs incoming requests to the appropriate backend service based on the request path, headers, user identity, geographic location, or any other signal. It's the most fundamental gateway pattern, but modern implementations go far beyond simple path matching.
Path-Based Routing
The most common form: map URL paths to backend services.
In Zuplo, routes are defined in an OpenAPI-format configuration file. Each route
specifies its path, HTTP method, handler (which backend to forward to), and any
policies to apply. Zuplo supports both standard
OpenAPI path parameters (/users/{userId}) and advanced
URL pattern matching with regex and
wildcards.
Geolocation-Based Routing
Route requests to different backends based on where the user is. This is useful for data residency compliance, latency optimization, or serving region-specific content.
In Zuplo, every request includes geolocation data — country, city, continent, latitude, longitude, and even the IATA airport code of the data center handling the request. You can use this data in a custom policy to route traffic to the nearest backend:
For a step-by-step walkthrough, see the geolocation backend routing guide.
Canary and A/B Routing
Route a percentage of traffic — or specific users — to a new version of a service. This lets you test new backends in production without exposing all users to potential issues.
A common approach is to route employee or internal traffic to the canary backend while external users continue hitting the stable version. Zuplo supports this through custom policies that inspect API key metadata or user claims and route accordingly:
Header-Based Routing
Route requests based on header values — useful for API versioning, content negotiation, or tenant-specific routing. Because Zuplo policies have full access to request headers, you can implement any header-based routing logic:
For more details on routing capabilities, see the Zuplo routing documentation.
Request and Response Transformation
The transformation pattern modifies requests before they reach your backend or responses before they reach your client. This lets you adapt APIs without changing the services themselves — useful for API versioning, data masking, format conversion, and legacy integration.
Common Transformations
Header manipulation — Add authentication tokens, remove internal headers, or inject tracing headers. Zuplo provides built-in policies for adding headers and removing headers without writing code.
Body transformation — Reshape request or response payloads. Convert between formats, add computed fields, strip sensitive data, or adapt one API's output to match another's expected input.
Query parameter manipulation — Zuplo includes policies for adding and removing query parameters, as well as converting query parameters to headers for backends that expect header-based inputs.
Response masking — Strip sensitive fields from responses before they reach the client. Zuplo's secret masking policy detects and redacts known secret patterns — including API keys, GitHub tokens, and private key blocks — in outbound responses, with support for custom regex patterns.
When to Use Transformation
Transformation is essential when you're integrating with APIs you don't control — third-party services, legacy systems, or partner APIs with different data formats. It's also valuable for maintaining backward compatibility when your backend API evolves: the gateway translates between old and new formats so existing clients keep working.
Circuit Breaker at the Gateway
The circuit breaker pattern prevents cascading failures by stopping requests to a backend service that's struggling or unresponsive. Instead of letting requests pile up and consume resources, the gateway detects failures and short-circuits with a fast error response, giving the backend time to recover.
How Circuit Breakers Work
A circuit breaker has three states:
- Closed (normal) — requests pass through to the backend. The breaker tracks error rates.
- Open — too many failures detected. The breaker rejects requests
immediately with a
503 Service Unavailablewithout contacting the backend. - Half-Open — after a timeout, the breaker allows a few test requests through. If they succeed, the circuit closes. If they fail, it opens again.
In a programmable gateway, you can implement a circuit breaker as a custom policy. While not every gateway provides this as a built-in feature, Zuplo's custom code policies give you the flexibility to implement circuit breaker logic using standard patterns:
Combining Circuit Breakers with Other Patterns
Circuit breakers work best alongside gateway offloading. If your gateway already handles rate limiting and caching, adding a circuit breaker creates a comprehensive resilience layer. Cached responses can be served even when the circuit is open, and rate limiting prevents backends from being overwhelmed in the first place.
Edge Gateway Pattern
The edge gateway pattern deploys your API gateway at globally distributed edge locations rather than in a single cloud region. Instead of all API traffic routing through one data center for processing, every request is handled at the closest point of presence to the user.
Why Edge Matters
A gateway running in us-east-1 adds hundreds of milliseconds of latency for
users in Tokyo, Sydney, or São Paulo — before it even starts processing the
request. With an edge gateway, that same request is processed at a nearby edge
location within a few milliseconds of network travel.
The performance impact compounds across multiple API calls. If a page load triggers ten API requests and each one saves 200ms of round-trip latency, you've just cut two full seconds off the load time for users in distant regions.
Edge-Native vs. CDN Caching
There's an important distinction between putting a CDN in front of a traditional gateway (which only caches static responses) and running an edge-native gateway that executes all processing at the edge. An edge-native gateway handles authentication, rate limiting, request validation, and custom logic at each edge location — not just caching.
Zuplo is built on this model. Every project deploys to 300+ global edge locations, and the full processing pipeline — routing, authentication, rate limiting, transformation, and custom TypeScript handlers — runs at the nearest PoP. Deployments go live globally in under 20 seconds.
For a deep dive into edge-native architecture, see Edge-Native API Gateway Architecture.
Choosing the Right Pattern
These patterns aren't mutually exclusive — most production gateways combine several. Here's how to think about which ones your architecture needs:
Start with gateway offloading. Almost every API benefits from centralizing authentication, rate limiting, and request validation at the gateway. This is the highest-value pattern with the lowest complexity.
Add routing when you have multiple backends. Once you're running more than one backend service, gateway routing gives you a single entry point with intelligent traffic distribution.
Use composition when clients need data from multiple services. If your clients are making three or more API calls to assemble a single view, gateway composition reduces latency and simplifies client code.
Adopt BFF when client needs diverge significantly. When your mobile app, web dashboard, and third-party API consumers need fundamentally different data shapes from the same underlying services, BFF prevents the "one API to rule them all" problem.
Apply transformation when integrating with external or legacy APIs. If you need to adapt data formats, mask sensitive fields, or maintain backward compatibility during migrations, gateway transformation handles it without touching backend code.
Consider edge deployment for global audiences. If your users span multiple continents, an edge gateway eliminates the latency penalty that comes from routing all traffic through a single region.
Add circuit breakers for resilience-critical systems. If a backend failure would cascade through your system, circuit breakers at the gateway provide a safety valve.
Implementing Gateway Patterns with Zuplo
Traditional gateways implement these patterns through plugins and configuration files. You get what the vendor built, and if your use case doesn't fit their plugin model, you're stuck writing workarounds or custom extensions in unfamiliar frameworks.
Zuplo takes a different approach. As a programmable API gateway, every pattern in this guide is implemented using standard TypeScript. The same language your team already uses for application development works for gateway logic — no proprietary DSLs, no plugin SDKs, no YAML-driven template engines.
Here's what makes this practical:
- Custom handlers — Write TypeScript functions that implement BFF and
composition patterns using standard
fetch()andcontext.invokeRoute() - Inbound and outbound policies — Create custom policies for routing, transformation, and circuit breaker logic
- 60+ built-in policies — Common offloading concerns like authentication, rate limiting, and request validation are handled by pre-built policies that require zero code
- Edge-native execution — Every custom handler and policy runs at 300+ global edge locations, so your gateway patterns execute close to your users
- OpenAPI-native routing — Routes are defined in OpenAPI format, giving you path-based routing with full parameter support and automatic API documentation
Whether you're offloading authentication from a single backend or building a full BFF layer for multiple client types, the implementation path is the same: write TypeScript, configure routes, and deploy.
Ready to implement these patterns? Sign up for Zuplo and start building your gateway in minutes. Or explore the policy catalog to see how offloading works out of the box, and the custom handler documentation for patterns that need full programmability.