Zuplo
API Caching

Cache responses where your users are

Edge response caching, semantic cache for AI workloads, and CDN-aware Cache-Control headers — all configured per route, all running across 300+ data centers without a Redis cluster to operate.

Why this matters

Caching is the cheapest performance work you'll ever do

Every team knows it. Most still don't ship it — because the path from "we should cache this" to "it's running in production with sane invalidation" is paved with Redis clusters, race conditions, and stale-data bug reports.

×

P95 latency you can't engineer down

The expensive query, the third-party lookup, the cold-start origin — they all add up to a tail latency you can't fix without architectural surgery. Customers feel every spike.

×

Origin cost scaling linearly with traffic

Every read hits your database. Every product page hits your catalog service. The bill grows with traffic and you're rate-limiting your own customers to stay in budget.

×

LLM bills doubling every quarter

Your AI feature is wonderful. It's also the same five questions over and over, regenerating identical answers at $0.03 each. There's no exact-match cache key for natural language.

×

Cache invalidation in production at 2am

Whoever named it the hardest problem in computer science wasn't wrong. Stale data complaints, surprise TTLs, and "who deployed the cache key change" post-mortems eat your week.

What you get

One JSON block, three layers of cache

Single-digit-ms responses

Cached responses serve from the same edge data center as your gateway — no central round-trip, no cold lookups. Typical hit latency is 3-10ms; uncached origin calls land in 60-300ms.

Origin protection on day one

Spikes, hot keys, and runaway clients hit the cache instead of your database. The caching-inbound policy is a single JSON block — no Redis to operate, no library to upgrade.

Built for AI workloads too

The semantic-cache-inbound policy matches on meaning, not exact strings. Combined with the AI Gateway, you stop paying for the same answer twice — and your TTFT improves dramatically.

Edge response cache

Cut origin load and tail latency in one policy

The caching-inbound policy is a distributed response cache that runs at the edge alongside your gateway. Configure TTL, status-code filtering, and cache-key composition (method + path + query + Authorization + any custom headers). Cache only what's safe; serve everything else from origin.

Edge Cache
caching-inbound · TTL awareLIVE
82%
Hit rateHealthy5-min window
Origin hits saved14,328last 24h
Cached P958msvs origin 200-400ms
Cache key parts5method · path · query · auth · vary
TimeCache keyTTLResultLatency
caching-inbound
semantic-cache for AI
ZoneCache in custom code
Configurable TTL per route
Status-code allowlist
Method-aware (GET/POST/HEAD)
Authorization-aware keys
Cache-bust via cacheId env var
Vary on any header
Programmable response composition

Cache the shared bits. Fetch only the per-customer bits.

Most API responses are 90% shared data — catalogs, lookup tables, weather, reference data — wrapped around a small per-customer payload. With a custom code policy you can serve the shared parts from ZoneCache, fetch just the per-customer delta from origin, and assemble the response in the gateway. Real Zuplo customers have cut origin bandwidth by 70% with this pattern.

TypeScriptCompose 480KB shared + 20KB per-customer in the gateway
import type { ZuploRequest, ZuploContext } from "@zuplo/runtime";
import { ZoneCache } from "@zuplo/runtime";

// Shared cache — same for every customer in this region
const sharedCache = new ZoneCache("weather-shared", context);

export default async function handler(
  request: ZuploRequest,
  context: ZuploContext,
) {
  const { region, customerId } = request.params;

  // 480KB of shared regional data — served from the edge
  let shared = await sharedCache.get(region);
  if (!shared) {
    const res = await fetch("https://origin/weather/region/" + region);
    shared = await res.json();
    await sharedCache.put(region, shared, 600); // 10-min TTL
  }

  // Tiny per-customer slice — fresh every request, never cached
  const prefsRes = await fetch(
    "https://origin/customers/" + customerId + "/prefs",
  );
  const prefs = await prefsRes.json();

  // Stitch them together at the edge — 20KB leaves the origin instead of 500KB
  return Response.json({ ...shared, preferences: prefs });
}
Cache the shared payload
Fetch only the per-customer delta
Compose response at the edge
Cache safely behind auth
Custom invalidation strategies
70% origin-bandwidth cut, real customers
CDN cache headers

Drive your CDN from the gateway

Use the set-headers-outbound policy to attach Cache-Control, ETag, and Vary headers per route — telling browsers and downstream CDNs (Akamai, Cloudflare, Fastly) exactly how to cache. For dynamic logic — different headers for 2xx vs 5xx, vary on response payload — drop into a custom outbound policy in TypeScript.

Outbound headers · /products/:id
Cache-Control: public, max-age=60, s-maxage=3600, stale-while-revalidate=120
ETag: "v2-94f37a"
Vary: Accept-Language, Accept
JSONset-headers-outbound policy
{
  "name": "cache-one-hour",
  "policyType": "set-headers-outbound",
  "handler": {
    "export": "SetHeadersOutboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "headers": [
        {
          "name": "Cache-Control",
          "value": "public, max-age=60, s-maxage=3600"
        }
      ]
    }
  }
}
Cache-Control + s-maxage
ETag + If-None-Match
Vary header support
Different headers per status
Akamai / Cloudflare ready
Custom TS outbound logic
AI Gateway Semantic cache

Stop paying for the same answer twice

The semantic-cache-inbound policy embeds incoming requests and returns a cached response when an existing entry is similar enough — controllable via semanticTolerance (0–1). Same idea, vastly different cost profile from exact-match caching.

  • Configurable similarity tolerance — tune semanticTolerance to balance cache hit rate vs answer fidelity.
  • Namespace isolation — separate caches per tenant, model, or product surface. No cross-contamination.
  • Custom cache-key extraction — derive the match key from a JSON property path or a TS function.
  • zp-semantic-cache header — every response surfaces HIT / MISS so you can measure cost savings directly.

Incoming prompt

"Tell me the capital city of France"

HIT similarity 0.91 · cached 3m ago

Cached response (served in 7ms)

"Paris is the capital of France."

cost saved: $0.024 namespace: geo-facts
What makes Zuplo different

Caching that fits how modern APIs actually work

Semantic caching for LLMs

Configurable similarity tolerance (0–1), namespace isolation for multi-tenant cache keys, custom cache-key extraction via property path or function. "What's the capital of France?" and "Tell me the capital city of France" hit the same cached answer.

Compose responses, not just cache them

ZoneCache exposes the same low-latency edge cache to your TypeScript code. Cache the shared 90% of a response, fetch only the per-customer delta from origin, assemble in the gateway. Real customers cut origin bandwidth by 70%. Try doing that with a CDN.

No Redis to provision

The cache is part of the gateway runtime across 300+ data centers — not a separate service. Nothing to provision, no maxmemory eviction policy to tune, no version mismatch between cache server and client library.

Composes with your CDN

Set Cache-Control + s-maxage from the gateway and your downstream CDN (Akamai, Cloudflare, Fastly) takes over. The two layers cooperate: CDN absorbs the cold visitors, Zuplo absorbs the authenticated, per-tenant traffic.

Real questions, real answers

What teams use this for

“Our product catalog is 90% of our DB load.”

Cache GET /products with caching-inbound, TTL 300s, status [200]. The first request hydrates; the next thousand serve from the edge in under 10ms. Origin load drops by an order of magnitude.

“We can't cache responses with auth — they're per-user.”

You can. caching-inbound includes the Authorization header in the cache key by default. Each user gets their own scoped cache; nothing leaks between tenants. You only opt out (with a documented warning) if the response is genuinely identical.

“Our AI assistant is rate-limited by spend, not traffic.”

Wire up semantic-cache-inbound on the LLM route. Similar prompts return cached completions; tolerance is tunable; you keep your tokens for the queries that actually need them.

“We need to nuke the cache after a deploy.”

Set cacheId via an environment variable (e.g. CACHE_ID), bump the value, redeploy. Every cache key now has a fresh prefix and old entries are stranded — clean cutover, no scripting required.

Frequently Asked Questions

Common questions about caching with Zuplo.

Cut tail latency in an afternoon

Drop the caching-inbound policy on a route, set a TTL, and watch your origin bills fall.