---
title: "API Gateway Caching: Strategies, Patterns, and Best Practices"
description: "Learn how API gateway caching reduces latency, cuts backend load, and scales your APIs with strategies for cache invalidation, edge caching, and more."
canonicalUrl: "https://zuplo.com/learning-center/api-gateway-caching"
pageType: "learning-center"
authors: "nate"
tags: "API Gateway, API Performance"
image: "https://zuplo.com/og?text=API%20Gateway%20Caching"
---
Every API call that hits your backend costs time and money. For read-heavy APIs,
the same data gets fetched over and over — the product catalog that hasn't
changed in hours, the configuration that updates once a day, the exchange rate
that refreshes every minute. Without caching, each of those requests takes the
full round trip through your gateway, into your backend, across your database,
and back again.

API gateway caching short-circuits this path. By storing responses at the
gateway layer, you serve repeat requests in milliseconds instead of hundreds of
milliseconds, reduce load on your origin servers, and give your APIs the
headroom to handle traffic spikes without scaling your infrastructure.

This guide covers the strategies, patterns, and trade-offs you need to implement
effective caching at the API gateway level.

## Why Cache at the API Gateway?

You can add caching at many points in your stack — the client, a CDN, the
application layer, or the database layer. Caching at the gateway is uniquely
powerful because the gateway sees every request before it reaches your backend.
That means you can:

- **Eliminate redundant backend calls.** If 1,000 users request the same
  endpoint within a minute, the backend handles it once. The gateway serves the
  other 999 from cache.
- **Reduce latency globally.** Edge-deployed gateways can cache responses close
  to your users, cutting response times from hundreds of milliseconds to single
  digits.
- **Protect your backend during traffic spikes.** A cached response requires
  zero compute from your origin. During a product launch or viral event, caching
  prevents your backend from buckling under load.
- **Lower infrastructure costs.** Fewer backend requests means fewer compute
  cycles, database queries, and third-party API calls — all of which have direct
  cost implications.

Gateway caching differs from application-level caching (like Redis or Memcached
in your backend) in an important way: it operates on complete HTTP responses,
not individual data objects. This means your backend code doesn't need to change
at all. You configure caching behavior at the gateway, and your existing API
continues working exactly as before — just faster.

## Caching Strategies for API Gateways

Not every API endpoint should be cached the same way. The right strategy depends
on how often your data changes, who's requesting it, and how sensitive it is.

### Full Response Caching

The most straightforward approach: cache the entire HTTP response (status code,
headers, and body) for a specified duration. When a matching request arrives,
the gateway returns the stored response without forwarding the request to the
backend.

Full response caching works best for:

- Public endpoints that return the same data to all users (product listings,
  public configurations, reference data)
- Endpoints with data that changes on a predictable schedule (hourly stats,
  daily reports)
- Third-party API responses you're proxying through your gateway

### Cache-Aside (Lazy Loading)

With cache-aside, the gateway checks the cache first. On a miss, it forwards the
request to the backend, then stores the response for future requests. This
pattern ensures the cache only contains data that has actually been requested,
which is more memory-efficient than pre-populating the cache.

Most gateway caching implementations, including
[Zuplo's Caching Policy](https://zuplo.com/docs/policies/caching-inbound), use
this pattern by default: generate a cache key, check for a hit, serve from cache
or forward to the backend and store the result.

### Stale-While-Revalidate

This strategy serves a stale cached response to the user immediately while
fetching a fresh copy from the backend in the background. The user gets a fast
response, and the cache gets updated for the next request. It's a good fit for
data that can tolerate brief staleness — like social media feeds or search
results.

You implement this using the `Cache-Control` directive `stale-while-revalidate`,
which tells the gateway how long a stale response is acceptable while it
refreshes the cache asynchronously.

### Selective Caching by Method and Status

Not every request should be cached. `GET` requests are natural candidates
because they're idempotent — the same request always returns the same result
(assuming the data hasn't changed). `POST`, `PUT`, and `DELETE` requests
typically modify state and shouldn't be cached by default.

Similarly, you'll want to cache only successful responses. There's little value
in caching `500 Internal Server Error` responses, and caching `401 Unauthorized`
responses could lead to confusing behavior. Most gateways let you specify which
HTTP methods and status codes are eligible for caching.

## Cache Invalidation Patterns

The hardest part of caching isn't storing data — it's knowing when to throw it
away. Stale data served from cache can cause anything from minor UX issues to
serious business logic errors. Here are the primary invalidation strategies.

### TTL-Based Expiration

Set a time-to-live (TTL) on cached entries so they expire automatically. This is
the simplest and most common approach:

- **Short TTLs (30–300 seconds)** for data that changes frequently, like stock
  prices or live scores
- **Medium TTLs (5–60 minutes)** for semi-dynamic content, like product catalogs
  or user profiles
- **Long TTLs (1–24 hours)** for rarely changing data, like API documentation or
  static configuration

The trade-off is straightforward: longer TTLs mean better cache hit rates but a
greater risk of serving stale data. Shorter TTLs keep data fresher but put more
load on your backend.

### Event-Driven Invalidation

Instead of waiting for a TTL to expire, you invalidate specific cache entries
when the underlying data changes. For example, when a product price is updated
in your database, you send a cache-purge event that removes the cached response
for that product's endpoint.

Event-driven invalidation requires more infrastructure (typically a message
queue or webhook system) but ensures your cache always reflects the current
state of your data.

### Cache-Busting with Versioning

Use a version identifier (a timestamp, build number, or environment variable) as
part of the cache key. When you need to invalidate all cached responses, change
the version identifier. Every request now generates a new cache key, effectively
bypassing the old cache entries.

This is particularly useful during deployments. If your API response format
changes in a new release, a cache-bust ensures users immediately get the new
format instead of stale responses from the previous version.

## Cache-Control Headers and HTTP Caching

HTTP provides a rich set of headers for controlling cache behavior. Getting
these right is essential for effective gateway caching.

### Cache-Control

The `Cache-Control` header is the primary mechanism for controlling caching
behavior. Key directives include:

- **`max-age=N`**: The response is fresh for N seconds. Browsers and
  intermediary caches use this to determine how long to store the response.
- **`s-maxage=N`**: Like `max-age`, but applies only to shared caches (CDNs and
  gateways). Use this when you want the gateway to cache longer than the
  browser.
- **`public`**: The response can be cached by any cache, including shared ones.
- **`private`**: The response is intended for a single user and should not be
  cached by shared caches like a gateway.
- **`no-cache`**: The cache must revalidate with the origin before serving the
  response.
- **`no-store`**: The response must not be cached at all.

A common pattern for gateway caching is
`Cache-Control: public, max-age=60, s-maxage=3600`. This tells browsers to cache
for 1 minute but allows the gateway to cache for 1 hour — giving users fresh
data in their browser while reducing backend load through the shared cache.

### ETag and Conditional Requests

An `ETag` is a unique identifier for a specific version of a resource. When a
client sends a request with an `If-None-Match` header containing the ETag, the
server can respond with `304 Not Modified` if the data hasn't changed — saving
bandwidth by not re-sending the full response body.

At the gateway level, ETags enable efficient cache validation without forcing a
full cache refresh on every TTL expiration.

### Vary Header

The `Vary` header tells caches which request headers affect the response. For
example, `Vary: Accept-Language` means the gateway should cache separate
responses for different language preferences. Without `Vary`, a response cached
for an English-language request might be served to a user requesting French.

Common `Vary` headers for API caching include `Accept`, `Accept-Encoding`, and
`Authorization` (when caching per-user responses).

## Edge Caching vs. Origin Caching

Where you cache matters as much as what you cache. The two primary locations are
at the edge (close to users) and at the origin (close to your backend).

### Edge Caching

Edge caching stores responses at globally distributed points of presence (PoPs).
When a user in Tokyo requests your API, the response is served from the nearest
edge location instead of traveling to your origin server in Virginia. This
reduces latency from hundreds of milliseconds to single digits.

Edge caching is ideal for:

- APIs with a global user base
- Public data that doesn't vary per user
- Read-heavy workloads where freshness requirements can tolerate a TTL

The main limitation is storage capacity. Edge nodes can't store everything, so
less frequently accessed responses may get evicted.

### Origin Caching

Origin caching happens at or near your backend — either in the gateway itself
(if it's deployed in the same region as your backend) or in a caching layer like
Redis between the gateway and your application.

Origin caching is better for:

- User-specific responses where edge caching would require too many variations
- Large response payloads that are expensive to distribute globally
- Data that changes too frequently for edge TTLs to be practical

### Combining Both Layers

The most effective caching architectures use both. Edge caching handles public,
read-heavy traffic globally, while origin caching reduces backend load for
dynamic or personalized requests. An
[edge-native gateway](/learning-center/edge-native-api-gateway-architecture)
makes this particularly powerful because the gateway itself runs at the edge —
meaning your caching logic, not just cached data, executes close to your users.

## Caching for Different API Types

### REST APIs

REST APIs are well-suited for caching because they follow HTTP conventions that
align with caching semantics. `GET` requests are idempotent, URLs are stable
identifiers, and HTTP caching headers work as designed.

Cache keys typically consist of the request method, URL path, and query
parameters. For endpoints that return different data based on authentication,
the `Authorization` header should also be part of the cache key to prevent
serving one user's data to another.

### GraphQL APIs

GraphQL is harder to cache because queries are typically sent as `POST` requests
to a single endpoint (`/graphql`), and the response depends entirely on the
query body. Traditional URL-based caching doesn't work.

To cache GraphQL at the gateway, you need to either:

- **Parse the query** and generate cache keys based on the operation name,
  fields, and variables
- **Use persisted queries** where each query has a unique identifier that can
  serve as a cache key
- **Cache at the field level** rather than the response level, which requires
  deeper integration with the GraphQL execution engine

### Read-Heavy vs. Write-Heavy APIs

Read-heavy APIs (data retrieval, search, configuration) benefit enormously from
caching. If 90% of your traffic is reads and your data changes infrequently, a
well-configured cache can absorb the majority of your traffic.

Write-heavy APIs (order processing, real-time updates, event ingestion) benefit
less from response caching. Instead, focus on caching the data needed to process
writes — authentication tokens, rate limit counters, and configuration data —
rather than the write responses themselves.

## Cache Key Design

A cache key determines which requests share a cached response. If the key is too
broad, different users get the wrong data. If it's too narrow, the cache stores
too many variations and hit rates drop.

### The Basics

Most gateway caches build keys from:

1. **HTTP method** — `GET /products` and `POST /products` should never share a
   cache entry
2. **URL path** — `/products/123` and `/products/456` are different resources
3. **Query parameters** — `/products?page=1` and `/products?page=2` return
   different data

### Handling Authentication

If your API returns user-specific data, the `Authorization` header must be part
of the cache key. Otherwise, User A's data could be served to User B — a serious
security issue.

For endpoints that return the same public data regardless of who's
authenticated, you can safely exclude the authorization header from the cache
key to improve hit rates. But this should be an explicit, deliberate decision.
In Zuplo's Caching Policy, for example, the `Authorization` header is included
in the cache key by default. You can override this with the
`dangerouslyIgnoreAuthorizationHeader` option — the name makes the risk clear.

### Including Custom Headers

Some APIs vary responses based on custom headers: `Accept-Language` for
localization, `Accept` for content negotiation, or custom headers for A/B
testing. Include these in your cache key when they affect the response body.

A cache key that considers method, path, query parameters, authorization, and
relevant custom headers gives you precise control over what's cached without
storing unnecessary duplicates.

## Monitoring and Debugging Cache Performance

A cache you can't observe is a cache you can't trust. Monitor these metrics:

- **Cache hit ratio**: The percentage of requests served from cache. A ratio
  below 50% suggests your TTLs are too short, your cache keys are too granular,
  or your traffic patterns don't favor caching.
- **Cache eviction rate**: How often entries are removed before their TTL
  expires, typically due to storage limits. High eviction rates mean your cache
  is too small for your working set.
- **Stale response rate**: How often users receive data that has since been
  updated at the origin. Track this to validate that your TTLs match your data
  freshness requirements.
- **Latency distribution**: Compare response times for cache hits vs. cache
  misses. The delta tells you exactly how much value your cache provides.

When debugging cache misses, check for:

- Query parameters that vary unnecessarily (timestamps, tracking IDs) inflating
  your key space
- Missing or misconfigured `Cache-Control` headers
- Authentication headers that create unique cache entries per user when the data
  is actually public

## Security Considerations

Caching introduces security surface area. A misconfigured cache can leak private
data, serve poisoned responses, or bypass authorization controls.

### Preventing Cache Poisoning

Cache poisoning occurs when an attacker causes the gateway to store a malicious
response that's then served to other users. This typically happens when
unvalidated request headers influence the response but aren't included in the
cache key.

Mitigate this by including all request components that affect the response in
your cache key, and by validating and sanitizing inputs at the gateway layer
before they reach your backend.

### Caching Authenticated Responses Safely

The golden rule: never cache user-specific responses in a shared cache without
including authentication information in the cache key. Set
`Cache-Control: private` for user-specific data, or ensure the `Authorization`
header is always part of your cache key.

For public data that requires authentication to access (but returns the same
response to all authenticated users), you can safely cache the response in a
shared cache — but document this decision clearly and enforce it at the gateway
configuration level.

### Sensitive Data in Cached Responses

Responses containing personally identifiable information (PII), financial data,
or health records generally should not be cached in shared caches at all. Use
`Cache-Control: no-store` for these endpoints. If caching is absolutely
necessary for performance, use encryption at rest and restrict cache access to
authorized components only.

## Implementing API Gateway Caching with Zuplo

Zuplo's architecture makes it particularly effective for API caching. Because
Zuplo runs on a globally distributed edge network across
[300+ data centers](https://zuplo.com/docs/managed-edge/overview), caching isn't
an add-on feature — it's a natural outcome of where your gateway already runs.
Every cached response is stored close to your users, reducing latency without
additional CDN configuration.

### Using the Caching Policy

Zuplo provides a built-in
[Caching Inbound Policy](https://zuplo.com/docs/policies/caching-inbound) that
you can add to any route. It handles cache key generation, TTL management, and
response storage with zero custom code:

```json
{
  "name": "my-caching-policy",
  "policyType": "caching-inbound",
  "handler": {
    "export": "CachingInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "cacheHttpMethods": ["GET"],
      "expirationSecondsTtl": 300,
      "headers": ["Accept", "Accept-Language"],
      "statusCodes": [200, 206, 301]
    }
  }
}
```

This configuration caches `GET` responses for 5 minutes, includes `Accept` and
`Accept-Language` in the cache key so different content types and languages get
separate cache entries, and only caches successful responses.

### Cache Key Customization

The policy automatically builds cache keys from the HTTP method, URL, query
parameters, and the `Authorization` header. You can extend the key by adding
custom headers via the `headers` option. For public endpoints where
authentication doesn't affect the response, you can exclude the `Authorization`
header from the key — though the option is intentionally named
`dangerouslyIgnoreAuthorizationHeader` as a reminder to think carefully before
enabling it.

### Cache-Busting on Demand

Zuplo supports cache-busting through the `cacheId` option. Set it to an
environment variable, and when you need to invalidate all cached responses,
update the variable and redeploy:

```json
{
  "options": {
    "cacheId": "$env(CACHE_ID)",
    "expirationSecondsTtl": 3600
  }
}
```

This approach is clean and predictable — no need for complex purge APIs or
cache-tag management.

### Programmatic Caching

For more sophisticated caching logic, Zuplo exposes the
[Cache API](https://zuplo.com/docs/programmable-api/cache) and
[ZoneCache API](https://zuplo.com/docs/programmable-api/zone-cache) for custom
TypeScript code. You can implement conditional caching, custom key generation,
or hybrid strategies that combine gateway caching with application logic:

```typescript
import { ZuploContext, ZuploRequest, ZoneCache } from "@zuplo/runtime";

export default async function (request: ZuploRequest, context: ZuploContext) {
  const cache = new ZoneCache("product-cache", context);
  const cacheKey = `products:${request.params.id}`;

  // Check cache first
  const cached = await cache.get(cacheKey);
  if (cached) {
    return new Response(JSON.stringify(cached), {
      headers: { "Content-Type": "application/json" },
    });
  }

  // Fetch from backend
  const response = await fetch(
    `https://api.example.com/products/${request.params.id}`,
  );
  const data = await response.json();

  // Store in cache (fire and forget for performance)
  cache.put(cacheKey, data, 300).catch((err) => context.log.error(err));

  return new Response(JSON.stringify(data), {
    headers: { "Content-Type": "application/json" },
  });
}
```

### Setting Cache Headers with Policies

You can also control downstream caching behavior by setting `Cache-Control`
headers on responses using Zuplo's
[Set Headers Outbound Policy](https://zuplo.com/docs/policies/set-headers-outbound):

```json
{
  "name": "cache-headers",
  "policyType": "set-headers-outbound",
  "handler": {
    "export": "SetHeadersOutboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "headers": [
        {
          "name": "Cache-Control",
          "value": "public, max-age=60, s-maxage=3600"
        }
      ]
    }
  }
}
```

This tells CDNs and edge caches to store the response for an hour while browsers
cache for just one minute — giving you a long-lived shared cache with short
browser freshness for rapid updates when needed.

## Putting It All Together

Effective API gateway caching isn't about caching everything — it's about
caching the right things at the right layers with the right invalidation
strategy. Start with these steps:

1. **Identify your cacheable endpoints.** Look for read-heavy routes with stable
   responses — product catalogs, configuration endpoints, reference data.
2. **Set appropriate TTLs.** Match TTL to how often your data changes, not how
   often it's requested.
3. **Design your cache keys carefully.** Include everything that affects the
   response, but nothing extra.
4. **Monitor hit rates.** If your cache isn't being hit, adjust your keys and
   TTLs.
5. **Layer your caching.** Use edge caching for global, public traffic and
   origin caching for personalized or high-frequency data.

For a deeper look at general API caching techniques, see our guide on
[how developers can use caching to improve API performance](/learning-center/how-developers-can-use-caching-to-improve-api-performance).
To understand how edge architecture amplifies caching benefits, read about
[edge-native API gateway architecture](/learning-center/edge-native-api-gateway-architecture).
And if you're working with AI APIs, explore
[semantic caching](/blog/what-is-semantic-caching) — a technique that caches
responses based on meaning rather than exact request matching.

Caching at the gateway layer is one of the highest-leverage
[performance optimizations](/learning-center/increase-api-performance) you can
make. An edge-native gateway like Zuplo makes it even more effective by putting
your cache — and your caching logic — at the point closest to your users. You
can get started with Zuplo's caching policies in minutes on the
[free tier](https://portal.zuplo.com).