API gateways have always enforced policies at the boundary: rate limiting by IP, authenticating by API key, routing by path prefix. These rules are simple, deterministic, and fast. They match patterns.
That works well for conventional API traffic — structured JSON payloads, predictable paths, well-defined schemas. But AI-native traffic looks different. When users talk to an LLM through your API, the “request” is a natural language prompt that could say anything. There is no pattern to match. There is only meaning.
This gap is driving a new generation of gateway policy approaches, with vendors like Kong investing in semantic policy enforcement powered by vector embeddings. Understanding what this means — and how it compares to traditional programmatic policy engines — will help you make better decisions about how to govern AI traffic in your own stack.
Why Traditional Pattern-Based Policies Struggle with AI Traffic
Consider a standard rate-limiting policy. It counts requests per API key per
minute and blocks when a threshold is crossed. That logic works the same whether
the request body is {"user_id": 123} or a 2,000-word prompt. The policy does
not care what the request says.
Now consider a more nuanced governance requirement: “If a user’s prompt involves financial data, route the response through our PII redaction service before returning it.”
With a regex-based approach, you might try to match keywords like “bank account,” “social security,” or “credit card.” But natural language is endlessly varied:
- “What’s my balance?”
- “Can you show me my recent transactions?”
- “Help me understand my quarterly statement”
- “Explain this wire transfer to me”
Each of these involves financial data, but none will reliably match a static keyword list. Users will rephrase, abbreviate, and ask obliquely. A pattern-based policy will always have gaps.
This is the fundamental problem: traditional gateway policies are built for structured data, but AI traffic is unstructured by design.
What Are Semantic Gateway Policies?
Semantic gateway policies use vector embeddings to classify requests by meaning rather than form. Instead of checking whether a request matches a pattern, the gateway converts the request into a high-dimensional vector that represents its semantic content, then compares that vector against policy-defined concepts.
The core idea comes from the same technology behind semantic search and retrieval- augmented generation (RAG): text with similar meaning produces similar vectors. “What is my account balance?” and “Show me how much money I have” will produce vectors close to each other in the embedding space, even though they share no keywords.
A semantic policy can then state: “If this request’s embedding is within N distance of the ‘financial data’ cluster, apply the PII-redaction plugin.” The gateway enforces the policy based on conceptual similarity rather than string matching.
How It Works Technically
The mechanics involve several components working in sequence:
-
On-gateway embedding model — When a request arrives, a lightweight model (often WASM-compiled for performance) converts the prompt or request body into a vector embedding. This adds latency, so vendors optimize heavily for speed.
-
Vector similarity matching — The embedding is compared against policy-defined concept vectors using cosine similarity or a similar distance metric. If the similarity score exceeds a threshold, the associated policy triggers.
-
Semantic caching — A natural extension of embedding-based comparison: if the incoming request is semantically similar to a previously cached request, return the cached response. This reduces both latency and LLM token costs.
-
Policy action — Once a policy triggers, it executes the same kinds of actions traditional policies do: route transformation, plugin invocation, request rejection, or header injection.
The interesting architectural claim is that semantic classification happens at the gateway layer, before the request ever reaches your backend. The gateway becomes a “Semantic Controller” that understands intent.
The Programmatic Alternative
Zuplo takes a different approach. Rather than making the gateway smart by giving it opaque AI classifiers, Zuplo makes the gateway programmable: you write TypeScript policies that execute with full access to the request and response.
This is a simplified example to illustrate the approach. In production, you might call a classification API, use a more sophisticated heuristic, or combine multiple signals. The point is that the logic is yours to write, read, test, and debug.
Programmatic Policies Are Explicit and Transparent
Every decision a programmatic policy makes is traceable. If a request is blocked, you can read the code and understand why. If a policy is behaving unexpectedly, you can add logging, write a unit test, or step through the logic in a debugger.
This is not possible with semantic policies. When a vector similarity check fires — or does not — the “reason” exists in the geometry of high-dimensional space. You can inspect the similarity score, but you cannot read an explanation. This creates real challenges for:
- Compliance and auditing — Regulators often require explainable decisions. “The embedding distance was 0.73” is rarely an acceptable audit trail.
- Debugging — When a policy misfires, you need to understand why. With opaque vector matching, diagnosing false positives and false negatives is difficult.
- Testing — You can write unit tests for TypeScript policies. Testing whether a semantic policy correctly handles every way a user might phrase a sensitive request is fundamentally harder.
Programmatic Policies Are Version-Controlled and Reviewable
Because Zuplo’s policies are TypeScript code, they live in Git. Every change is tracked, reviewed in a pull request, tested in CI, and deployed through branch previews. If a policy change breaks something in production, you can see exactly what changed, who changed it, and revert it with a standard Git workflow.
Semantic policies trained on embedding models do not have this property. Changing the training data, threshold values, or model version can silently alter policy behavior in ways that are hard to detect until something goes wrong.
Semantic Caching: Where Semantic Approaches Add Clear Value
Semantic policies for security and routing decisions introduce opacity where you often need clarity. But there is a context where semantic technology offers straightforward, measurable value with fewer downsides: caching.
Zuplo’s Semantic Cache Policy applies vector embedding comparison to determine cache hits, not security decisions. When a new request arrives, the gateway checks whether any stored request is semantically similar. If the similarity exceeds the configured tolerance, the cached response is returned.
This approach works well because:
- The failure mode is low-stakes — A false positive (returning a cached response that does not perfectly match the query) is far less serious than a security policy misfiring. Users get a slightly off answer; they do not get unauthorized data exposure.
- The benefit is concrete and measurable — Cache hit rates, latency improvements, and cost reduction from avoided LLM calls are all easy to quantify. You can evaluate whether the policy is working by looking at numbers.
- The semantics align with the goal — The whole point of semantic caching is to recognize that “What is the capital of France?” and “Tell me France’s capital” should return the same answer. Semantic similarity is exactly the right measure.
The tolerance setting (on a 0–1 scale) gives you explicit control over how aggressively the cache matches. Lower tolerance means only near-identical requests hit the cache. Higher tolerance allows more flexible matching at the cost of potentially returning responses for queries that differ more meaningfully.
For teams using Zuplo’s AI Gateway, semantic caching is available as a built-in toggle — no policy configuration needed. The gateway handles embedding generation, similarity matching, and cache management automatically, and you can see cache hit rates in the analytics dashboard.
This is semantic technology applied narrowly, where its tradeoffs are acceptable, rather than as a blanket replacement for explicit policy logic.
Tradeoffs: When Each Approach Makes Sense
Neither approach is universally better. The right choice depends on what you are trying to govern and how much opacity you can tolerate.
| Dimension | Semantic Policies | Programmatic Policies |
|---|---|---|
| Classification | Handles novel phrasing automatically | Requires anticipating patterns in advance |
| Transparency | Decisions are hard to explain or audit | Every decision is readable and traceable |
| Testability | Hard to test exhaustively across phrasing variations | Standard unit tests work |
| Version control | Model/threshold changes are subtle | All changes live in Git, reviewed in PRs |
| Latency | Embedding model adds overhead (typically 1–3ms) | Negligible overhead for simple logic |
| Compliance | Difficult for audit-sensitive environments | Explicit logic satisfies audit requirements |
| Caching use case | Works well — failure modes are acceptable | Exact-match caching only; misses similar queries |
| Security use case | Risk of opaque misfire | Explicit rules are predictable and reviewable |
| Developer experience | Configure once, infer automatically | Write code; full control and debuggability |
Use semantic classification when:
- You need to classify open-ended natural language requests that cannot be anticipated with keyword lists
- The classification failure mode is low-stakes (caching, soft routing hints)
- You have time to evaluate and tune the model’s behavior against real traffic
- You can accept some opacity in exchange for coverage
Use programmatic policies when:
- Decisions have compliance, audit, or security implications
- You need to be able to explain every policy decision
- You want policies that are unit-testable and code-reviewable
- Your team is more comfortable reasoning about code than probability thresholds
Use both: The most robust approach often combines semantic classification with programmatic enforcement. A semantic classifier might suggest that a request contains financial data, but a programmatic policy makes the final call: “If the classifier score is above 0.85 and the user’s account tier is not ‘enterprise’, reject the request.” The semantic layer provides signal; the programmatic layer makes the decision and remains the source of truth for auditors.
The Convergence: Hybrid Approaches
The most thoughtful production systems are moving toward hybrid architectures that play to each approach’s strengths:
- Semantic classification as a signal — Use embeddings to generate a classification score or category label for each request.
- Programmatic policy as the decision layer — Pass that signal to a TypeScript policy that combines it with other data (user attributes, rate limits, downstream service availability) to make an explicit, auditable decision.
- Semantic caching as a performance layer — Cache LLM responses based on semantic similarity to reduce cost and latency without affecting security logic.
This architecture preserves the developer-friendly properties of programmatic policies — explainability, testability, version control — while adding semantic awareness where it actually helps.
Implementing Governance for AI Traffic with Zuplo
Zuplo’s approach to AI traffic governance is to give developers explicit, composable tools rather than opaque automatic classification:
TypeScript policy engine — Write any request inspection, transformation, or routing logic in TypeScript. Call external classification APIs, implement custom heuristics, or combine multiple signals. Your policies are code: readable, testable, and version- controlled.
Prompt Injection Detection policy — A built-in policy that uses AI-powered content analysis specifically for prompt injection security. This is semantic analysis applied narrowly to a specific security use case, with a well-defined failure mode and a clear audit trail (blocked requests are logged with the reason).
Semantic Cache Policy — Applies vector similarity matching for caching decisions, where the stakes of a mismatch are low and the performance benefits are measurable. You configure the tolerance and TTL; the policy handles the rest.
AI Gateway with built-in semantic caching — For teams using LLMs in production, the AI Gateway includes semantic caching as a one-toggle feature alongside cost controls, provider routing, and observability.
Edge-native deployment — All of this runs across 300+ global edge locations. Adding a classification layer (semantic or programmatic) is only practical if the gateway itself is fast. Processing policies close to your users keeps the overhead manageable.
GitOps-first workflow — Policies are code committed to your repository, reviewed in pull requests, and deployed through branch previews. When Kong says “train your gateway with embeddings,” Zuplo says “commit your policy to Git.”
Conclusion
Semantic gateway policies represent a genuine architectural innovation for handling unstructured AI traffic. Vector embeddings can recognize meaning in ways that keyword matching cannot, and that capability will find its place in the AI infrastructure stack.
But “semantic” is not a synonym for “better.” The right question is where semantic classification belongs in your governance architecture and where explicit, programmatic logic is the safer choice.
For caching, semantic similarity is a natural fit — the failure mode is acceptable and the performance gains are real. For security and compliance decisions, programmatic policies remain more appropriate because they are explainable, testable, and auditable in ways that vector similarity matching is not.
The most durable architectures will combine both: semantic signals informing programmatic decisions, with semantic caching handling the performance layer independently. Zuplo’s toolset is designed for exactly this — giving you the primitives to build the right combination for your traffic patterns and compliance requirements.
