---
title: "AI"
description:
  "Three dedicated gateway project types — API Gateway, AI Gateway, and MCP
  Gateway — each deployed independently and purpose-built for a different layer
  of the AI stack."
canonicalUrl: "https://zuplo.com/ai"
sourceUrl: "https://zuplo.com/ai"
pageType: "product"
generatedAt: "2026-04-22"
---

# Three Gateways. Built for Modern AI Systems.

> Zuplo offers three dedicated project types — API Gateway, AI Gateway, and MCP
> Gateway — each deployed independently and purpose-built for a different layer
> of the AI stack. Choose the one that matches your workload; there is no hidden
> coupling between them.

## Three gateway project types. Each purpose-built.

### API Gateway

**Secure and publish APIs.**

The foundational infrastructure project. Routing, authentication, rate limiting,
validation, and a developer portal — all OpenAPI-native with GitOps deployment.

- Auto-generate from OpenAPI
- JSON-RPC 2.0 compliant
- Works with Claude, ChatGPT, Gemini
- GitOps deploys

### AI Gateway

**Control LLM traffic with guardrails.**

A separate project type designed specifically for managing model calls. Not
generic API management — built for LLM governance. Route between providers,
enforce budgets, cache semantically, and block injections.

- Model routing
- Prompt injection protection
- Semantic caching
- Budget & token controls
- Auto-failover

### MCP Gateway _(Private Beta)_

**Govern MCP servers across your organization.**

A separate project type focused entirely on MCP management at scale. This is not
about generating tools — it's about controlling them. Add internal and
third-party MCP servers, segment by team, enforce RBAC, and prevent sprawl.

- Add internal + external MCP servers
- Create virtual MCP servers
- RBAC per team
- Control access to sensitive tools
- Centralized audit logs
- Prevent MCP sprawl

> **Private Beta — MCP Gateway**: Be among the first to govern MCP servers at
> scale. Request early access — limited spots available.

## Architecture: How Zuplo Project Types Work

Three distinct project types. You choose the one that matches your workload.
Each runs as its own project — there is no hidden coupling.

| Project Type | Primary Role            | Key Capabilities                                                                         |
| ------------ | ----------------------- | ---------------------------------------------------------------------------------------- |
| API Gateway  | Secure and publish APIs | OpenAPI-native routing, auth, rate limiting, developer portal, GitOps                    |
| AI Gateway   | Control LLM traffic     | Model routing, semantic caching, budget enforcement, injection protection, auto-failover |
| MCP Gateway  | Govern MCP servers      | RBAC per team, virtual MCP servers, centralized audit logs, tool-access control          |

## Why Zuplo: Built for every layer of the AI stack

**Security First** Prompt protection. Auth policies. Audit logs. Rate limits.

**Cost Predictability** Budgets, token limits, semantic caching, usage tracking.

**Built for Developers** OpenAPI-native. GitOps deployment. No shadow
infrastructure.

**Enterprise Governance** RBAC. Virtual MCP servers. Segmented tool access.

## Use Cases: What teams are building with Zuplo AI

**AI-powered SaaS** _(AI Gateway)_ Secure multi-tenant LLM usage with team-level
budgets. Route between providers. Cache repeated prompts. Enforce usage limits
per customer without custom middleware.

**Enterprise Internal AI** _(MCP Gateway)_ Expose internal systems safely via
MCP with RBAC. Finance sees Stripe tools. Engineering sees GitHub tools. All
governed centrally — no shadow integrations.

**AI Agents + Commerce** _(API Gateway)_ Let AI agents create orders, manage
inventory, and check status — with policy enforcement. Your OpenAPI routes
become tools in seconds, ready for any AI client.

## Frequently Asked Questions

**How does Zuplo handle rate limiting for LLM API calls?**

Zuplo supports both request-based and token-based rate limiting for LLM APIs.
You can define limits per API key, per user, or per plan — measured in requests
per minute or tokens consumed. Limits are enforced at the edge before calls
reach your LLM provider, preventing cost overruns and ensuring fair usage across
consumers.

**Can Zuplo act as an AI gateway in front of OpenAI, Anthropic, or other LLM
providers?**

Yes. Zuplo proxies requests to any LLM provider, adding authentication, rate
limiting, cost controls, audit logging, and semantic caching as a transparent
layer. Multiple providers can be configured with automatic fallback — if one
provider fails or rate-limits, traffic can be routed to a backup automatically.

**How does Zuplo help control AI API costs?**

Zuplo reduces AI API costs through several mechanisms: semantic caching returns
cached responses for semantically similar queries, token-based rate limits cap
consumption per consumer, spend limits halt requests when a threshold is
exceeded, and model routing directs requests to cheaper models when appropriate.
These controls operate at the gateway without changes to your application code.

**Does Zuplo support the Model Context Protocol (MCP)?**

Yes. Zuplo can secure and manage MCP servers, applying authentication,
authorization, and rate limiting to AI agent tool calls just like any other API.
This makes it straightforward to expose MCP servers to external agents while
maintaining control over access and usage.

**How does Zuplo support token-based billing for AI products?**

Zuplo can meter token consumption from LLM API responses and report usage to
billing providers like Stripe. Custom TypeScript policies extract token counts
from provider responses (e.g., OpenAI's usage object) and record them as
billable units. This enables precise usage-based billing that reflects actual
LLM cost drivers.

**How does semantic caching work in Zuplo for AI APIs?**

Zuplo's semantic cache uses vector embeddings to identify requests that are
semantically similar — not just identical — to previously cached queries. When a
match is found above a configurable similarity threshold, the cached response is
returned immediately without forwarding to the LLM. This reduces latency, cuts
API costs, and improves throughput for AI-powered applications.

## Next steps

- [Start a free project](/signup)
- [Book a demo](https://zuplo.com/meeting)
- Request early access to the MCP Gateway private beta at [zuplo.com/ai](/ai)
