API Gateway Caching: Strategies, Patterns, and Best Practices

Every API call that hits your backend costs time and money. For read-heavy APIs, the same data gets fetched over and over — the product catalog that hasn’t changed in hours, the configuration that updates once a day, the exchange rate that refreshes every minute. Without caching, each of those requests takes the full round trip through your gateway, into your backend, across your database, and back again.

API gateway caching short-circuits this path. By storing responses at the gateway layer, you serve repeat requests in milliseconds instead of hundreds of milliseconds, reduce load on your origin servers, and give your APIs the headroom to handle traffic spikes without scaling your infrastructure.

This guide covers the strategies, patterns, and trade-offs you need to implement effective caching at the API gateway level.

Why Cache at the API Gateway?

You can add caching at many points in your stack — the client, a CDN, the application layer, or the database layer. Caching at the gateway is uniquely powerful because the gateway sees every request before it reaches your backend. That means you can:

Eliminate redundant backend calls. If 1,000 users request the same endpoint within a minute, the backend handles it once. The gateway serves the other 999 from cache.
Reduce latency globally. Edge-deployed gateways can cache responses close to your users, cutting response times from hundreds of milliseconds to single digits.
Protect your backend during traffic spikes. A cached response requires zero compute from your origin. During a product launch or viral event, caching prevents your backend from buckling under load.
Lower infrastructure costs. Fewer backend requests means fewer compute cycles, database queries, and third-party API calls — all of which have direct cost implications.

Gateway caching differs from application-level caching (like Redis or Memcached in your backend) in an important way: it operates on complete HTTP responses, not individual data objects. This means your backend code doesn’t need to change at all. You configure caching behavior at the gateway, and your existing API continues working exactly as before — just faster.

Caching Strategies for API Gateways

Not every API endpoint should be cached the same way. The right strategy depends on how often your data changes, who’s requesting it, and how sensitive it is.

Full Response Caching

The most straightforward approach: cache the entire HTTP response (status code, headers, and body) for a specified duration. When a matching request arrives, the gateway returns the stored response without forwarding the request to the backend.

Full response caching works best for:

Public endpoints that return the same data to all users (product listings, public configurations, reference data)
Endpoints with data that changes on a predictable schedule (hourly stats, daily reports)
Third-party API responses you’re proxying through your gateway

Cache-Aside (Lazy Loading)

With cache-aside, the gateway checks the cache first. On a miss, it forwards the request to the backend, then stores the response for future requests. This pattern ensures the cache only contains data that has actually been requested, which is more memory-efficient than pre-populating the cache.

Most gateway caching implementations, including Zuplo’s Caching Policy, use this pattern by default: generate a cache key, check for a hit, serve from cache or forward to the backend and store the result.

Stale-While-Revalidate

This strategy serves a stale cached response to the user immediately while fetching a fresh copy from the backend in the background. The user gets a fast response, and the cache gets updated for the next request. It’s a good fit for data that can tolerate brief staleness — like social media feeds or search results.

You implement this using the Cache-Control directive stale-while-revalidate, which tells the gateway how long a stale response is acceptable while it refreshes the cache asynchronously.

Selective Caching by Method and Status

Not every request should be cached. GET requests are natural candidates because they’re idempotent — the same request always returns the same result (assuming the data hasn’t changed). POST, PUT, and DELETE requests typically modify state and shouldn’t be cached by default.

Similarly, you’ll want to cache only successful responses. There’s little value in caching 500 Internal Server Error responses, and caching 401 Unauthorized responses could lead to confusing behavior. Most gateways let you specify which HTTP methods and status codes are eligible for caching.

Cache Invalidation Patterns

The hardest part of caching isn’t storing data — it’s knowing when to throw it away. Stale data served from cache can cause anything from minor UX issues to serious business logic errors. Here are the primary invalidation strategies.

TTL-Based Expiration

Set a time-to-live (TTL) on cached entries so they expire automatically. This is the simplest and most common approach:

Short TTLs (30–300 seconds) for data that changes frequently, like stock prices or live scores
Medium TTLs (5–60 minutes) for semi-dynamic content, like product catalogs or user profiles
Long TTLs (1–24 hours) for rarely changing data, like API documentation or static configuration

The trade-off is straightforward: longer TTLs mean better cache hit rates but a greater risk of serving stale data. Shorter TTLs keep data fresher but put more load on your backend.

Event-Driven Invalidation

Instead of waiting for a TTL to expire, you invalidate specific cache entries when the underlying data changes. For example, when a product price is updated in your database, you send a cache-purge event that removes the cached response for that product’s endpoint.

Event-driven invalidation requires more infrastructure (typically a message queue or webhook system) but ensures your cache always reflects the current state of your data.

Cache-Busting with Versioning

Use a version identifier (a timestamp, build number, or environment variable) as part of the cache key. When you need to invalidate all cached responses, change the version identifier. Every request now generates a new cache key, effectively bypassing the old cache entries.

This is particularly useful during deployments. If your API response format changes in a new release, a cache-bust ensures users immediately get the new format instead of stale responses from the previous version.

Cache-Control Headers and HTTP Caching

HTTP provides a rich set of headers for controlling cache behavior. Getting these right is essential for effective gateway caching.

Cache-Control

The Cache-Control header is the primary mechanism for controlling caching behavior. Key directives include:

max-age=N: The response is fresh for N seconds. Browsers and intermediary caches use this to determine how long to store the response.
s-maxage=N: Like max-age, but applies only to shared caches (CDNs and gateways). Use this when you want the gateway to cache longer than the browser.
public: The response can be cached by any cache, including shared ones.
private: The response is intended for a single user and should not be cached by shared caches like a gateway.
no-cache: The cache must revalidate with the origin before serving the response.
no-store: The response must not be cached at all.

A common pattern for gateway caching is Cache-Control: public, max-age=60, s-maxage=3600. This tells browsers to cache for 1 minute but allows the gateway to cache for 1 hour — giving users fresh data in their browser while reducing backend load through the shared cache.

ETag and Conditional Requests

An ETag is a unique identifier for a specific version of a resource. When a client sends a request with an If-None-Match header containing the ETag, the server can respond with 304 Not Modified if the data hasn’t changed — saving bandwidth by not re-sending the full response body.

At the gateway level, ETags enable efficient cache validation without forcing a full cache refresh on every TTL expiration.

Vary Header

The Vary header tells caches which request headers affect the response. For example, Vary: Accept-Language means the gateway should cache separate responses for different language preferences. Without Vary, a response cached for an English-language request might be served to a user requesting French.

Common Vary headers for API caching include Accept, Accept-Encoding, and Authorization (when caching per-user responses).

Edge Caching vs. Origin Caching

Where you cache matters as much as what you cache. The two primary locations are at the edge (close to users) and at the origin (close to your backend).

Edge Caching

Edge caching stores responses at globally distributed points of presence (PoPs). When a user in Tokyo requests your API, the response is served from the nearest edge location instead of traveling to your origin server in Virginia. This reduces latency from hundreds of milliseconds to single digits.

Edge caching is ideal for:

APIs with a global user base
Public data that doesn’t vary per user
Read-heavy workloads where freshness requirements can tolerate a TTL

The main limitation is storage capacity. Edge nodes can’t store everything, so less frequently accessed responses may get evicted.

Origin Caching

Origin caching happens at or near your backend — either in the gateway itself (if it’s deployed in the same region as your backend) or in a caching layer like Redis between the gateway and your application.

Origin caching is better for:

User-specific responses where edge caching would require too many variations
Large response payloads that are expensive to distribute globally
Data that changes too frequently for edge TTLs to be practical

Combining Both Layers

The most effective caching architectures use both. Edge caching handles public, read-heavy traffic globally, while origin caching reduces backend load for dynamic or personalized requests. An edge-native gateway makes this particularly powerful because the gateway itself runs at the edge — meaning your caching logic, not just cached data, executes close to your users.

Caching for Different API Types

REST APIs

REST APIs are well-suited for caching because they follow HTTP conventions that align with caching semantics. GET requests are idempotent, URLs are stable identifiers, and HTTP caching headers work as designed.

Cache keys typically consist of the request method, URL path, and query parameters. For endpoints that return different data based on authentication, the Authorization header should also be part of the cache key to prevent serving one user’s data to another.

GraphQL APIs

GraphQL is harder to cache because queries are typically sent as POST requests to a single endpoint (/graphql), and the response depends entirely on the query body. Traditional URL-based caching doesn’t work.

To cache GraphQL at the gateway, you need to either:

Parse the query and generate cache keys based on the operation name, fields, and variables
Use persisted queries where each query has a unique identifier that can serve as a cache key
Cache at the field level rather than the response level, which requires deeper integration with the GraphQL execution engine

Read-Heavy vs. Write-Heavy APIs

Read-heavy APIs (data retrieval, search, configuration) benefit enormously from caching. If 90% of your traffic is reads and your data changes infrequently, a well-configured cache can absorb the majority of your traffic.

Write-heavy APIs (order processing, real-time updates, event ingestion) benefit less from response caching. Instead, focus on caching the data needed to process writes — authentication tokens, rate limit counters, and configuration data — rather than the write responses themselves.

Cache Key Design

A cache key determines which requests share a cached response. If the key is too broad, different users get the wrong data. If it’s too narrow, the cache stores too many variations and hit rates drop.

The Basics

Most gateway caches build keys from:

HTTP method — GET /products and POST /products should never share a cache entry
URL path — /products/123 and /products/456 are different resources
Query parameters — /products?page=1 and /products?page=2 return different data

Handling Authentication

If your API returns user-specific data, the Authorization header must be part of the cache key. Otherwise, User A’s data could be served to User B — a serious security issue.

For endpoints that return the same public data regardless of who’s authenticated, you can safely exclude the authorization header from the cache key to improve hit rates. But this should be an explicit, deliberate decision. In Zuplo’s Caching Policy, for example, the Authorization header is included in the cache key by default. You can override this with the dangerouslyIgnoreAuthorizationHeader option — the name makes the risk clear.

Including Custom Headers

Some APIs vary responses based on custom headers: Accept-Language for localization, Accept for content negotiation, or custom headers for A/B testing. Include these in your cache key when they affect the response body.

A cache key that considers method, path, query parameters, authorization, and relevant custom headers gives you precise control over what’s cached without storing unnecessary duplicates.

Monitoring and Debugging Cache Performance

A cache you can’t observe is a cache you can’t trust. Monitor these metrics:

Cache hit ratio: The percentage of requests served from cache. A ratio below 50% suggests your TTLs are too short, your cache keys are too granular, or your traffic patterns don’t favor caching.
Cache eviction rate: How often entries are removed before their TTL expires, typically due to storage limits. High eviction rates mean your cache is too small for your working set.
Stale response rate: How often users receive data that has since been updated at the origin. Track this to validate that your TTLs match your data freshness requirements.
Latency distribution: Compare response times for cache hits vs. cache misses. The delta tells you exactly how much value your cache provides.

When debugging cache misses, check for:

Query parameters that vary unnecessarily (timestamps, tracking IDs) inflating your key space
Missing or misconfigured Cache-Control headers
Authentication headers that create unique cache entries per user when the data is actually public

Security Considerations

Caching introduces security surface area. A misconfigured cache can leak private data, serve poisoned responses, or bypass authorization controls.

Preventing Cache Poisoning

Cache poisoning occurs when an attacker causes the gateway to store a malicious response that’s then served to other users. This typically happens when unvalidated request headers influence the response but aren’t included in the cache key.

Mitigate this by including all request components that affect the response in your cache key, and by validating and sanitizing inputs at the gateway layer before they reach your backend.

Caching Authenticated Responses Safely

The golden rule: never cache user-specific responses in a shared cache without including authentication information in the cache key. Set Cache-Control: private for user-specific data, or ensure the Authorization header is always part of your cache key.

For public data that requires authentication to access (but returns the same response to all authenticated users), you can safely cache the response in a shared cache — but document this decision clearly and enforce it at the gateway configuration level.

Sensitive Data in Cached Responses

Responses containing personally identifiable information (PII), financial data, or health records generally should not be cached in shared caches at all. Use Cache-Control: no-store for these endpoints. If caching is absolutely necessary for performance, use encryption at rest and restrict cache access to authorized components only.

Implementing API Gateway Caching with Zuplo

Zuplo’s architecture makes it particularly effective for API caching. Because Zuplo runs on a globally distributed edge network across 300+ data centers, caching isn’t an add-on feature — it’s a natural outcome of where your gateway already runs. Every cached response is stored close to your users, reducing latency without additional CDN configuration.

Using the Caching Policy

Zuplo provides a built-in Caching Inbound Policy that you can add to any route. It handles cache key generation, TTL management, and response storage with zero custom code:

json

{
  "name": "my-caching-policy",
  "policyType": "caching-inbound",
  "handler": {
    "export": "CachingInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "cacheHttpMethods": ["GET"],
      "expirationSecondsTtl": 300,
      "headers": ["Accept", "Accept-Language"],
      "statusCodes": [200, 206, 301]
    }
  }
}

This configuration caches GET responses for 5 minutes, includes Accept and Accept-Language in the cache key so different content types and languages get separate cache entries, and only caches successful responses.

Cache Key Customization

The policy automatically builds cache keys from the HTTP method, URL, query parameters, and the Authorization header. You can extend the key by adding custom headers via the headers option. For public endpoints where authentication doesn’t affect the response, you can exclude the Authorization header from the key — though the option is intentionally named dangerouslyIgnoreAuthorizationHeader as a reminder to think carefully before enabling it.

Cache-Busting on Demand

Zuplo supports cache-busting through the cacheId option. Set it to an environment variable, and when you need to invalidate all cached responses, update the variable and redeploy:

json

{
  "options": {
    "cacheId": "$env(CACHE_ID)",
    "expirationSecondsTtl": 3600
  }
}

This approach is clean and predictable — no need for complex purge APIs or cache-tag management.

Programmatic Caching

For more sophisticated caching logic, Zuplo exposes the Cache API and ZoneCache API for custom TypeScript code. You can implement conditional caching, custom key generation, or hybrid strategies that combine gateway caching with application logic:

typescript

import { ZuploContext, ZuploRequest, ZoneCache } from "@zuplo/runtime";

export default async function (request: ZuploRequest, context: ZuploContext) {
  const cache = new ZoneCache("product-cache", context);
  const cacheKey = `products:${request.params.id}`;

  // Check cache first
  const cached = await cache.get(cacheKey);
  if (cached) {
    return new Response(JSON.stringify(cached), {
      headers: { "Content-Type": "application/json" },
    });
  }

  // Fetch from backend
  const response = await fetch(
    `https://api.example.com/products/${request.params.id}`,
  );
  const data = await response.json();

  // Store in cache (fire and forget for performance)
  cache.put(cacheKey, data, 300).catch((err) => context.log.error(err));

  return new Response(JSON.stringify(data), {
    headers: { "Content-Type": "application/json" },
  });
}

Setting Cache Headers with Policies

You can also control downstream caching behavior by setting Cache-Control headers on responses using Zuplo’s Set Headers Outbound Policy:

json

{
  "name": "cache-headers",
  "policyType": "set-headers-outbound",
  "handler": {
    "export": "SetHeadersOutboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "headers": [
        {
          "name": "Cache-Control",
          "value": "public, max-age=60, s-maxage=3600"
        }
      ]
    }
  }
}

This tells CDNs and edge caches to store the response for an hour while browsers cache for just one minute — giving you a long-lived shared cache with short browser freshness for rapid updates when needed.

Putting It All Together

Effective API gateway caching isn’t about caching everything — it’s about caching the right things at the right layers with the right invalidation strategy. Start with these steps:

Identify your cacheable endpoints. Look for read-heavy routes with stable responses — product catalogs, configuration endpoints, reference data.
Set appropriate TTLs. Match TTL to how often your data changes, not how often it’s requested.
Design your cache keys carefully. Include everything that affects the response, but nothing extra.
Monitor hit rates. If your cache isn’t being hit, adjust your keys and TTLs.
Layer your caching. Use edge caching for global, public traffic and origin caching for personalized or high-frequency data.

For a deeper look at general API caching techniques, see our guide on how developers can use caching to improve API performance. To understand how edge architecture amplifies caching benefits, read about edge-native API gateway architecture. And if you’re working with AI APIs, explore semantic caching — a technique that caches responses based on meaning rather than exact request matching.

Caching at the gateway layer is one of the highest-leverage performance optimizations you can make. An edge-native gateway like Zuplo makes it even more effective by putting your cache — and your caching logic — at the point closest to your users. You can get started with Zuplo’s caching policies in minutes on the free tier.

Why Cache at the API Gateway?

Caching Strategies for API Gateways

Full Response Caching

Cache-Aside (Lazy Loading)

Stale-While-Revalidate

Selective Caching by Method and Status

Cache Invalidation Patterns

TTL-Based Expiration

Event-Driven Invalidation

Cache-Busting with Versioning

Cache-Control Headers and HTTP Caching

Cache-Control

ETag and Conditional Requests

Vary Header

Edge Caching vs. Origin Caching

Edge Caching

Origin Caching

Combining Both Layers

Caching for Different API Types

REST APIs

GraphQL APIs

Read-Heavy vs. Write-Heavy APIs

Cache Key Design

The Basics

Handling Authentication

Including Custom Headers

Monitoring and Debugging Cache Performance

Security Considerations

Preventing Cache Poisoning

Caching Authenticated Responses Safely

Sensitive Data in Cached Responses

Implementing API Gateway Caching with Zuplo

Using the Caching Policy

Cache Key Customization

Cache-Busting on Demand

Programmatic Caching

Setting Cache Headers with Policies

Putting It All Together

Try the platform behind this guide