1. Examples
  2. Semantic Caching

Semantic Caching

Reduce AI API costs by caching responses for semantically similar queries.

Deploy to Zuplo
Deploy to Zuplo

Prerequisite: You need a Zuplo account to run this example. Sign up for free

This example demonstrates how to use Zuplo's Semantic Cache Policy to cache responses based on semantic similarity rather than exact matches.

With semantic caching, requests with similar meaning return cached responses even when the wording differs. For example, "What is the capital of France?" and "Tell me the capital city of France" would return the same cached response.

Prerequisites

  • A Zuplo account. You can sign up for free.

Working with this example

Locally

Working locally is the best way to explore and understand the code for this example. You can get a local version by using the Zuplo CLI:

Terminalbash
npx create-zuplo-api@latest --example semantic-caching

Deploy this example to Zuplo

It is also possible to deploy this example directly to your Zuplo account and work with it via the Zuplo Portal. You can do this by clicking the Deploy to Zuplo button anywhere on this page.

How It Works

The Semantic Cache Policy uses LLM embeddings to determine semantic similarity between cache keys. When a request comes in:

  1. The policy extracts a cache key from the request (in this example, the question field from the JSON body, but it could be any key you want)
  2. It checks for semantically similar cache keys based on the configured tolerance
  3. If a match is found, the cached response is returned
  4. If no match is found, the request proceeds to the handler and the response is cached

Project Structure

The files that are most important in this example are the following:

text
├── config/
│   ├── routes.oas.json    # Route configuration with semantic cache policy
│   └── policies.json      # Policy configuration
└── modules/
    └── question-handler.ts # Simple handler for demo purposes

Configuration

Policy Configuration

The semantic cache policy is configured in config/policies.json:

JSONjson
{
  "policies": [
    {
      "name": "semantic-cache",
      "policyType": "semantic-cache-inbound",
      "handler": {
        "export": "SemanticCacheInboundPolicy",
        "module": "$import(@zuplo/runtime)",
        "options": {
          // Use a property from the JSON body as the cache key
          "cacheBy": "propertyPath",
          // Extract the "question" field from the request body
          "cacheByPropertyPath": ".question",
          // Cache entries expire after 5 minutes
          "expirationSecondsTtl": 300,
          // How similar questions must be to match (0-1 scale)
          // Lower values = stricter matching, higher = more flexible
          "semanticTolerance": 0.3,
          // Add a response header showing HIT or MISS
          "returnCacheStatusHeader": true
        }
      }
    }
  ]
}

Key Options

OptionDescription
cacheByHow to generate the cache key. Use propertyPath to extract from JSON body, or function for custom logic.
cacheByPropertyPathThe JSON path to use as the cache key (e.g., .question extracts the question field).
semanticToleranceHow similar requests must be to match (0-1 scale). Lower values require closer matches.
expirationSecondsTtlHow long cached responses remain valid (in seconds).
returnCacheStatusHeaderWhen true, adds a zp-semantic-cache header showing HIT or MISS.

For all available options, see the Semantic Cache Policy documentation.

Route Configuration

The policy is applied to the /ask route in config/routes.oas.json:

JSONjson
{
  "paths": {
    "/ask": {
      "post": {
        "x-zuplo-route": {
          "policies": {
            "inbound": ["semantic-cache-inbound"]
          },
          "handler": {
            "export": "default",
            "module": "$import(./modules/question-handler)"
          }
        }
      }
    }
  }
}

Running the Example

Start the API Gateway by running:

Terminalbash
npm run dev

The server will start on https://localhost:9000 and the endpoint will be available at https://localhost:9000/ask.

You can also

Testing the Semantic Cache

Use the following curl commands to see the semantic cache in action:

Terminalbash
# Request 1: Initial question (expect MISS)
curl -s -i http://localhost:9000/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the capital of France?"}'

# Request 2: Exact same question (expect HIT)
curl -s -i http://localhost:9000/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the capital of France?"}'

# Request 3: Semantically similar question (expect HIT)
curl -s -i http://localhost:9000/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "Tell me the capital city of France"}'

# Request 4: Different question (expect MISS)
curl -s -i http://localhost:9000/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the population of Tokyo?"}'

Expected Results

RequestQuestionExpected Header
1"What is the capital of France?"zp-semantic-cache: MISS
2"What is the capital of France?"zp-semantic-cache: HIT
3"Tell me the capital city of France"zp-semantic-cache: HIT
4"What is the population of Tokyo?"zp-semantic-cache: MISS

The generatedAt timestamp in the response body lets you verify cache behavior. Cached responses will have the same timestamp as the original request.

Using with Real LLM Responses

This example uses a simple custom request handler that returns a static response for demonstration purposes.

For production use with real LLM responses, Zuplo's AI Gateway provides built-in semantic caching along with additional features like cost controls, team budgets, and provider abstraction.

Next Steps

  • Use Zuplo's AI Gateway for production LLM caching with built-in semantic caching, cost controls, and observability
  • Learn more about Zuplo policies
  • Explore exact-match caching for non-semantic use cases

Quick Links

View on GitHubDocumentation

Run Locally

Clone and run this example:

npx create-zuplo-api --example semantic-caching

On This Page

Related Examples

Explore more examples in this category

Idempotency Keys

Programmability

Prevent duplicate API requests and ensure safe retries for payments and critical operations.

View Example
Check all of our Examples

Scale your APIs with
confidence.

Start for free or book a demo with our team.
Book a demoStart for Free
SOC 2 TYPE 2High Performer Spring 2025Momentum Leader Spring 2025Best Estimated ROI Spring 2025Easiest To Use Spring 2025Fastest Implementation Spring 2025

Get Updates From Zuplo

Zuplo logo
© 2026 zuplo. All rights reserved.
Products & Features
API ManagementAI GatewayMCP ServersMCP GatewayDeveloper PortalRate LimitingOpenAPI NativeGitOpsProgrammableAPI Key ManagementMulti-cloudAPI GovernanceMonetizationSelf-Serve DevX
Developers
DocumentationBlogLearning CenterCommunityChangelogIntegrations
Product
PricingSupportSign InCustomer Stories
Company
About UsMedia KitCareersStatusTrust & Compliance
Privacy PolicySecurity PoliciesTerms of ServiceTrust & Compliance
Docs
Pricing
Sign Up
Login
ContactBook a demoFAQ
Zuplo logo
DocsPricingSign Up
Login