Cache responses where your users are
Edge response caching, semantic cache for AI workloads, and CDN-aware Cache-Control headers — all configured per route, all running across 300+ data centers without a Redis cluster to operate.
Caching is the cheapest performance work you'll ever do
Every team knows it. Most still don't ship it — because the path from "we should cache this" to "it's running in production with sane invalidation" is paved with Redis clusters, race conditions, and stale-data bug reports.
P95 latency you can't engineer down
The expensive query, the third-party lookup, the cold-start origin — they all add up to a tail latency you can't fix without architectural surgery. Customers feel every spike.
Origin cost scaling linearly with traffic
Every read hits your database. Every product page hits your catalog service. The bill grows with traffic and you're rate-limiting your own customers to stay in budget.
LLM bills doubling every quarter
Your AI feature is wonderful. It's also the same five questions over and over, regenerating identical answers at $0.03 each. There's no exact-match cache key for natural language.
Cache invalidation in production at 2am
Whoever named it the hardest problem in computer science wasn't wrong. Stale data complaints, surprise TTLs, and "who deployed the cache key change" post-mortems eat your week.
One JSON block, three layers of cache
Single-digit-ms responses
Cached responses serve from the same edge data center as your gateway — no central round-trip, no cold lookups. Typical hit latency is 3-10ms; uncached origin calls land in 60-300ms.
Origin protection on day one
Spikes, hot keys, and runaway clients hit the cache instead of your database. The caching-inbound policy is a single JSON block — no Redis to operate, no library to upgrade.
Built for AI workloads too
The semantic-cache-inbound policy matches on meaning, not exact strings. Combined with the AI Gateway, you stop paying for the same answer twice — and your TTFT improves dramatically.
Cut origin load and tail latency in one policy
The caching-inbound policy is a distributed response cache that runs at the edge alongside your gateway. Configure TTL, status-code filtering, and cache-key composition (method + path + query + Authorization + any custom headers). Cache only what's safe; serve everything else from origin.
Cache the shared bits. Fetch only the per-customer bits.
Most API responses are 90% shared data — catalogs, lookup tables, weather, reference data — wrapped around a small per-customer payload. With a custom code policy you can serve the shared parts from ZoneCache, fetch just the per-customer delta from origin, and assemble the response in the gateway. Real Zuplo customers have cut origin bandwidth by 70% with this pattern.
Drive your CDN from the gateway
Use the set-headers-outbound policy to attach Cache-Control, ETag, and Vary headers per route — telling browsers and downstream CDNs (Akamai, Cloudflare, Fastly) exactly how to cache. For dynamic logic — different headers for 2xx vs 5xx, vary on response payload — drop into a custom outbound policy in TypeScript.
Stop paying for the same answer twice
The semantic-cache-inbound policy embeds incoming requests and returns a cached response when an existing entry is similar enough — controllable via semanticTolerance (0–1). Same idea, vastly different cost profile from exact-match caching.
- Configurable similarity tolerance — tune semanticTolerance to balance cache hit rate vs answer fidelity.
- Namespace isolation — separate caches per tenant, model, or product surface. No cross-contamination.
- Custom cache-key extraction — derive the match key from a JSON property path or a TS function.
- zp-semantic-cache header — every response surfaces HIT / MISS so you can measure cost savings directly.
Incoming prompt
"Tell me the capital city of France"
Cached response (served in 7ms)
"Paris is the capital of France."
Caching that fits how modern APIs actually work
Semantic caching for LLMs
Configurable similarity tolerance (0–1), namespace isolation for multi-tenant cache keys, custom cache-key extraction via property path or function. "What's the capital of France?" and "Tell me the capital city of France" hit the same cached answer.
Compose responses, not just cache them
ZoneCache exposes the same low-latency edge cache to your TypeScript code. Cache the shared 90% of a response, fetch only the per-customer delta from origin, assemble in the gateway. Real customers cut origin bandwidth by 70%. Try doing that with a CDN.
No Redis to provision
The cache is part of the gateway runtime across 300+ data centers — not a separate service. Nothing to provision, no maxmemory eviction policy to tune, no version mismatch between cache server and client library.
Composes with your CDN
Set Cache-Control + s-maxage from the gateway and your downstream CDN (Akamai, Cloudflare, Fastly) takes over. The two layers cooperate: CDN absorbs the cold visitors, Zuplo absorbs the authenticated, per-tenant traffic.
What teams use this for
“Our product catalog is 90% of our DB load.”
Cache GET /products with caching-inbound, TTL 300s, status [200]. The first request hydrates; the next thousand serve from the edge in under 10ms. Origin load drops by an order of magnitude.
“We can't cache responses with auth — they're per-user.”
You can. caching-inbound includes the Authorization header in the cache key by default. Each user gets their own scoped cache; nothing leaks between tenants. You only opt out (with a documented warning) if the response is genuinely identical.
“Our AI assistant is rate-limited by spend, not traffic.”
Wire up semantic-cache-inbound on the LLM route. Similar prompts return cached completions; tolerance is tunable; you keep your tokens for the queries that actually need them.
“We need to nuke the cache after a deploy.”
Set cacheId via an environment variable (e.g. CACHE_ID), bump the value, redeploy. Every cache key now has a fresh prefix and old entries are stranded — clean cutover, no scripting required.
Frequently Asked Questions
Common questions about caching with Zuplo.
Cut tail latency in an afternoon
Drop the caching-inbound policy on a route, set a TTL, and watch your origin bills fall.