Question 1

How do I cache API responses?

Accepted Answer

Cache at the gateway, before the request hits your origin. Drop a caching policy on a route, set a TTL, and repeat requests serve from the cache instead of your backend — single-digit-millisecond response times, lower origin load, lower bandwidth bills. Zuplo gives you exact-match caching (TTL plus a configurable cache key), semantic caching for AI, and a programmable cache API for the patterns you write yourself.

Question 2

How do I cache per-customer or per-API-key?

Accepted Answer

Include the customer's Authorization header (or API key) in the cache key. Zuplo's caching policy does this by default, so each authenticated user naturally gets their own cached responses. Add any other header — tenant ID, region, plan — to the key for finer scoping. Or use the programmable cache API to compose a custom key based on response payload, request body, or any signal you choose.

Question 3

How do I invalidate API cache entries?

Accepted Answer

Three options. Bump a CACHE_ID environment variable and redeploy to invalidate everything (one-line CI step). Let TTLs expire — configurable per route. Or use the programmable cache API in a custom policy to evict specific keys when an underlying record changes — for example, drop /users/{id} cache entries when that user updates their profile.

Question 4

How do I cache AI / LLM responses?

Accepted Answer

Use semantic caching. Instead of exact-key matching, it embeds the prompt and returns the cached response when a new prompt is semantically similar — "capital of France" and "What is France's capital?" hit the same answer. This works because LLM responses are deterministic enough for similar inputs and expensive enough that even a 30% cache hit rate cuts your token bill significantly. Zuplo ships semantic-cache-inbound as an Enterprise policy and exposes it as a one-toggle setting in the AI Gateway.

Question 5

How do I set Cache-Control headers for browsers and CDNs?

Accepted Answer

Add a set-headers policy to your route and attach Cache-Control: public, max-age=60, s-maxage=3600 (or any combination). For dynamic logic — different Cache-Control for 2xx vs 5xx, vary on response payload — write a 5-line custom TypeScript policy. Downstream CDNs (Akamai, Cloudflare) and browsers respect the headers; combined with the gateway's own cache you get layered protection.

Question 6

What's the difference between an API gateway cache and a CDN?

Accepted Answer

A CDN caches static bytes in front of your gateway, so cache hits never see your auth or rate-limit logic. A gateway cache like Zuplo's runs after authentication and policy enforcement — every request is identified, authorized, and observed before it hits the cache. The two compose well: Zuplo sets Cache-Control headers that the CDN respects, and Zuplo's own cache protects your origin when the CDN misses.

Question 7

Can I compose cached and live data in one response?

Accepted Answer

Yes — and this is where most caching products fall short. Zuplo's programmable cache API lets your TypeScript handler fetch the cached shared portion (e.g. a 480KB product catalog) and combine it with a live per-customer delta (e.g. 20KB of personalized pricing). The result is one response, mostly served from cache, with the dynamic bits computed in real time. Bandwidth savings of 70%+ are common.

Question 8

What's the best API gateway for caching?

Accepted Answer

Look for: distributed edge cache (low latency, no central cluster to operate), per-customer cache keying that respects Authorization, semantic caching for AI workloads, programmable cache APIs for non-exact-match patterns, and Cache-Control header control for downstream CDNs. Zuplo combines all of these in 300+ POPs, with no Redis cluster to run yourself. Talk to a Zuplo expert about cache topology for your stack.

Cache responses where your users are

Caching is the cheapest performance work you'll ever do

P95 latency you can't engineer down

Origin cost scaling linearly with traffic

LLM bills doubling every quarter

Cache invalidation in production at 2am

One JSON block, three layers of cache

Single-digit-ms responses

Origin protection on day one

Built for AI workloads too

Cut origin load and tail latency in one policy

Cache the shared bits. Fetch only the per-customer bits.

Drive your CDN from the gateway

Stop paying for the same answer twice

Caching that fits how modern APIs actually work

Semantic caching for LLMs

Compose responses, not just cache them

No Redis to provision

Composes with your CDN

What teams use this for

Frequently Asked Questions

Cut tail latency in an afternoon