Every API call that hits your backend costs time and money. For read-heavy APIs, the same data gets fetched over and over — the product catalog that hasn't changed in hours, the configuration that updates once a day, the exchange rate that refreshes every minute. Without caching, each of those requests takes the full round trip through your gateway, into your backend, across your database, and back again.
API gateway caching short-circuits this path. By storing responses at the gateway layer, you serve repeat requests in milliseconds instead of hundreds of milliseconds, reduce load on your origin servers, and give your APIs the headroom to handle traffic spikes without scaling your infrastructure.
This guide covers the strategies, patterns, and trade-offs you need to implement effective caching at the API gateway level.
Why Cache at the API Gateway?
You can add caching at many points in your stack — the client, a CDN, the application layer, or the database layer. Caching at the gateway is uniquely powerful because the gateway sees every request before it reaches your backend. That means you can:
- Eliminate redundant backend calls. If 1,000 users request the same endpoint within a minute, the backend handles it once. The gateway serves the other 999 from cache.
- Reduce latency globally. Edge-deployed gateways can cache responses close to your users, cutting response times from hundreds of milliseconds to single digits.
- Protect your backend during traffic spikes. A cached response requires zero compute from your origin. During a product launch or viral event, caching prevents your backend from buckling under load.
- Lower infrastructure costs. Fewer backend requests means fewer compute cycles, database queries, and third-party API calls — all of which have direct cost implications.
Gateway caching differs from application-level caching (like Redis or Memcached in your backend) in an important way: it operates on complete HTTP responses, not individual data objects. This means your backend code doesn't need to change at all. You configure caching behavior at the gateway, and your existing API continues working exactly as before — just faster.
Caching Strategies for API Gateways
Not every API endpoint should be cached the same way. The right strategy depends on how often your data changes, who's requesting it, and how sensitive it is.
Full Response Caching
The most straightforward approach: cache the entire HTTP response (status code, headers, and body) for a specified duration. When a matching request arrives, the gateway returns the stored response without forwarding the request to the backend.
Full response caching works best for:
- Public endpoints that return the same data to all users (product listings, public configurations, reference data)
- Endpoints with data that changes on a predictable schedule (hourly stats, daily reports)
- Third-party API responses you're proxying through your gateway
Cache-Aside (Lazy Loading)
With cache-aside, the gateway checks the cache first. On a miss, it forwards the request to the backend, then stores the response for future requests. This pattern ensures the cache only contains data that has actually been requested, which is more memory-efficient than pre-populating the cache.
Most gateway caching implementations, including Zuplo's Caching Policy, use this pattern by default: generate a cache key, check for a hit, serve from cache or forward to the backend and store the result.
Stale-While-Revalidate
This strategy serves a stale cached response to the user immediately while fetching a fresh copy from the backend in the background. The user gets a fast response, and the cache gets updated for the next request. It's a good fit for data that can tolerate brief staleness — like social media feeds or search results.
You implement this using the Cache-Control directive stale-while-revalidate,
which tells the gateway how long a stale response is acceptable while it
refreshes the cache asynchronously.
Selective Caching by Method and Status
Not every request should be cached. GET requests are natural candidates
because they're idempotent — the same request always returns the same result
(assuming the data hasn't changed). POST, PUT, and DELETE requests
typically modify state and shouldn't be cached by default.
Similarly, you'll want to cache only successful responses. There's little value
in caching 500 Internal Server Error responses, and caching 401 Unauthorized
responses could lead to confusing behavior. Most gateways let you specify which
HTTP methods and status codes are eligible for caching.
Cache Invalidation Patterns
The hardest part of caching isn't storing data — it's knowing when to throw it away. Stale data served from cache can cause anything from minor UX issues to serious business logic errors. Here are the primary invalidation strategies.
TTL-Based Expiration
Set a time-to-live (TTL) on cached entries so they expire automatically. This is the simplest and most common approach:
- Short TTLs (30–300 seconds) for data that changes frequently, like stock prices or live scores
- Medium TTLs (5–60 minutes) for semi-dynamic content, like product catalogs or user profiles
- Long TTLs (1–24 hours) for rarely changing data, like API documentation or static configuration
The trade-off is straightforward: longer TTLs mean better cache hit rates but a greater risk of serving stale data. Shorter TTLs keep data fresher but put more load on your backend.
Event-Driven Invalidation
Instead of waiting for a TTL to expire, you invalidate specific cache entries when the underlying data changes. For example, when a product price is updated in your database, you send a cache-purge event that removes the cached response for that product's endpoint.
Event-driven invalidation requires more infrastructure (typically a message queue or webhook system) but ensures your cache always reflects the current state of your data.
Cache-Busting with Versioning
Use a version identifier (a timestamp, build number, or environment variable) as part of the cache key. When you need to invalidate all cached responses, change the version identifier. Every request now generates a new cache key, effectively bypassing the old cache entries.
This is particularly useful during deployments. If your API response format changes in a new release, a cache-bust ensures users immediately get the new format instead of stale responses from the previous version.
Cache-Control Headers and HTTP Caching
HTTP provides a rich set of headers for controlling cache behavior. Getting these right is essential for effective gateway caching.
Cache-Control
The Cache-Control header is the primary mechanism for controlling caching
behavior. Key directives include:
max-age=N: The response is fresh for N seconds. Browsers and intermediary caches use this to determine how long to store the response.s-maxage=N: Likemax-age, but applies only to shared caches (CDNs and gateways). Use this when you want the gateway to cache longer than the browser.public: The response can be cached by any cache, including shared ones.private: The response is intended for a single user and should not be cached by shared caches like a gateway.no-cache: The cache must revalidate with the origin before serving the response.no-store: The response must not be cached at all.
A common pattern for gateway caching is
Cache-Control: public, max-age=60, s-maxage=3600. This tells browsers to cache
for 1 minute but allows the gateway to cache for 1 hour — giving users fresh
data in their browser while reducing backend load through the shared cache.
ETag and Conditional Requests
An ETag is a unique identifier for a specific version of a resource. When a
client sends a request with an If-None-Match header containing the ETag, the
server can respond with 304 Not Modified if the data hasn't changed — saving
bandwidth by not re-sending the full response body.
At the gateway level, ETags enable efficient cache validation without forcing a full cache refresh on every TTL expiration.
Vary Header
The Vary header tells caches which request headers affect the response. For
example, Vary: Accept-Language means the gateway should cache separate
responses for different language preferences. Without Vary, a response cached
for an English-language request might be served to a user requesting French.
Common Vary headers for API caching include Accept, Accept-Encoding, and
Authorization (when caching per-user responses).
Edge Caching vs. Origin Caching
Where you cache matters as much as what you cache. The two primary locations are at the edge (close to users) and at the origin (close to your backend).
Edge Caching
Edge caching stores responses at globally distributed points of presence (PoPs). When a user in Tokyo requests your API, the response is served from the nearest edge location instead of traveling to your origin server in Virginia. This reduces latency from hundreds of milliseconds to single digits.
Edge caching is ideal for:
- APIs with a global user base
- Public data that doesn't vary per user
- Read-heavy workloads where freshness requirements can tolerate a TTL
The main limitation is storage capacity. Edge nodes can't store everything, so less frequently accessed responses may get evicted.
Origin Caching
Origin caching happens at or near your backend — either in the gateway itself (if it's deployed in the same region as your backend) or in a caching layer like Redis between the gateway and your application.
Origin caching is better for:
- User-specific responses where edge caching would require too many variations
- Large response payloads that are expensive to distribute globally
- Data that changes too frequently for edge TTLs to be practical
Combining Both Layers
The most effective caching architectures use both. Edge caching handles public, read-heavy traffic globally, while origin caching reduces backend load for dynamic or personalized requests. An edge-native gateway makes this particularly powerful because the gateway itself runs at the edge — meaning your caching logic, not just cached data, executes close to your users.
Caching for Different API Types
REST APIs
REST APIs are well-suited for caching because they follow HTTP conventions that
align with caching semantics. GET requests are idempotent, URLs are stable
identifiers, and HTTP caching headers work as designed.
Cache keys typically consist of the request method, URL path, and query
parameters. For endpoints that return different data based on authentication,
the Authorization header should also be part of the cache key to prevent
serving one user's data to another.
GraphQL APIs
GraphQL is harder to cache because queries are typically sent as POST requests
to a single endpoint (/graphql), and the response depends entirely on the
query body. Traditional URL-based caching doesn't work.
To cache GraphQL at the gateway, you need to either:
- Parse the query and generate cache keys based on the operation name, fields, and variables
- Use persisted queries where each query has a unique identifier that can serve as a cache key
- Cache at the field level rather than the response level, which requires deeper integration with the GraphQL execution engine
Read-Heavy vs. Write-Heavy APIs
Read-heavy APIs (data retrieval, search, configuration) benefit enormously from caching. If 90% of your traffic is reads and your data changes infrequently, a well-configured cache can absorb the majority of your traffic.
Write-heavy APIs (order processing, real-time updates, event ingestion) benefit less from response caching. Instead, focus on caching the data needed to process writes — authentication tokens, rate limit counters, and configuration data — rather than the write responses themselves.
Cache Key Design
A cache key determines which requests share a cached response. If the key is too broad, different users get the wrong data. If it's too narrow, the cache stores too many variations and hit rates drop.
The Basics
Most gateway caches build keys from:
- HTTP method —
GET /productsandPOST /productsshould never share a cache entry - URL path —
/products/123and/products/456are different resources - Query parameters —
/products?page=1and/products?page=2return different data
Handling Authentication
If your API returns user-specific data, the Authorization header must be part
of the cache key. Otherwise, User A's data could be served to User B — a serious
security issue.
For endpoints that return the same public data regardless of who's
authenticated, you can safely exclude the authorization header from the cache
key to improve hit rates. But this should be an explicit, deliberate decision.
In Zuplo's Caching Policy, for example, the Authorization header is included
in the cache key by default. You can override this with the
dangerouslyIgnoreAuthorizationHeader option — the name makes the risk clear.
Including Custom Headers
Some APIs vary responses based on custom headers: Accept-Language for
localization, Accept for content negotiation, or custom headers for A/B
testing. Include these in your cache key when they affect the response body.
A cache key that considers method, path, query parameters, authorization, and relevant custom headers gives you precise control over what's cached without storing unnecessary duplicates.
Monitoring and Debugging Cache Performance
A cache you can't observe is a cache you can't trust. Monitor these metrics:
- Cache hit ratio: The percentage of requests served from cache. A ratio below 50% suggests your TTLs are too short, your cache keys are too granular, or your traffic patterns don't favor caching.
- Cache eviction rate: How often entries are removed before their TTL expires, typically due to storage limits. High eviction rates mean your cache is too small for your working set.
- Stale response rate: How often users receive data that has since been updated at the origin. Track this to validate that your TTLs match your data freshness requirements.
- Latency distribution: Compare response times for cache hits vs. cache misses. The delta tells you exactly how much value your cache provides.
When debugging cache misses, check for:
- Query parameters that vary unnecessarily (timestamps, tracking IDs) inflating your key space
- Missing or misconfigured
Cache-Controlheaders - Authentication headers that create unique cache entries per user when the data is actually public
Security Considerations
Caching introduces security surface area. A misconfigured cache can leak private data, serve poisoned responses, or bypass authorization controls.
Preventing Cache Poisoning
Cache poisoning occurs when an attacker causes the gateway to store a malicious response that's then served to other users. This typically happens when unvalidated request headers influence the response but aren't included in the cache key.
Mitigate this by including all request components that affect the response in your cache key, and by validating and sanitizing inputs at the gateway layer before they reach your backend.
Caching Authenticated Responses Safely
The golden rule: never cache user-specific responses in a shared cache without
including authentication information in the cache key. Set
Cache-Control: private for user-specific data, or ensure the Authorization
header is always part of your cache key.
For public data that requires authentication to access (but returns the same response to all authenticated users), you can safely cache the response in a shared cache — but document this decision clearly and enforce it at the gateway configuration level.
Sensitive Data in Cached Responses
Responses containing personally identifiable information (PII), financial data,
or health records generally should not be cached in shared caches at all. Use
Cache-Control: no-store for these endpoints. If caching is absolutely
necessary for performance, use encryption at rest and restrict cache access to
authorized components only.
Implementing API Gateway Caching with Zuplo
Zuplo's architecture makes it particularly effective for API caching. Because Zuplo runs on a globally distributed edge network across 300+ data centers, caching isn't an add-on feature — it's a natural outcome of where your gateway already runs. Every cached response is stored close to your users, reducing latency without additional CDN configuration.
Using the Caching Policy
Zuplo provides a built-in Caching Inbound Policy that you can add to any route. It handles cache key generation, TTL management, and response storage with zero custom code:
This configuration caches GET responses for 5 minutes, includes Accept and
Accept-Language in the cache key so different content types and languages get
separate cache entries, and only caches successful responses.
Cache Key Customization
The policy automatically builds cache keys from the HTTP method, URL, query
parameters, and the Authorization header. You can extend the key by adding
custom headers via the headers option. For public endpoints where
authentication doesn't affect the response, you can exclude the Authorization
header from the key — though the option is intentionally named
dangerouslyIgnoreAuthorizationHeader as a reminder to think carefully before
enabling it.
Cache-Busting on Demand
Zuplo supports cache-busting through the cacheId option. Set it to an
environment variable, and when you need to invalidate all cached responses,
update the variable and redeploy:
This approach is clean and predictable — no need for complex purge APIs or cache-tag management.
Programmatic Caching
For more sophisticated caching logic, Zuplo exposes the Cache API and ZoneCache API for custom TypeScript code. You can implement conditional caching, custom key generation, or hybrid strategies that combine gateway caching with application logic:
Setting Cache Headers with Policies
You can also control downstream caching behavior by setting Cache-Control
headers on responses using Zuplo's
Set Headers Outbound Policy:
This tells CDNs and edge caches to store the response for an hour while browsers cache for just one minute — giving you a long-lived shared cache with short browser freshness for rapid updates when needed.
Putting It All Together
Effective API gateway caching isn't about caching everything — it's about caching the right things at the right layers with the right invalidation strategy. Start with these steps:
- Identify your cacheable endpoints. Look for read-heavy routes with stable responses — product catalogs, configuration endpoints, reference data.
- Set appropriate TTLs. Match TTL to how often your data changes, not how often it's requested.
- Design your cache keys carefully. Include everything that affects the response, but nothing extra.
- Monitor hit rates. If your cache isn't being hit, adjust your keys and TTLs.
- Layer your caching. Use edge caching for global, public traffic and origin caching for personalized or high-frequency data.
For a deeper look at general API caching techniques, see our guide on how developers can use caching to improve API performance. To understand how edge architecture amplifies caching benefits, read about edge-native API gateway architecture. And if you're working with AI APIs, explore semantic caching — a technique that caches responses based on meaning rather than exact request matching.
Caching at the gateway layer is one of the highest-leverage performance optimizations you can make. An edge-native gateway like Zuplo makes it even more effective by putting your cache — and your caching logic — at the point closest to your users. You can get started with Zuplo's caching policies in minutes on the free tier.