---
title: "Best API Gateways for AI and LLM Workloads (2026): Evaluative Comparison for Teams Building on Top of LLMs"
description: "Compare 10 API gateways for AI and LLM workloads in 2026. Evaluation covers token rate limiting, MCP support, model routing, and multi-cloud deployment."
canonicalUrl: "https://zuplo.com/learning-center/best-api-gateways-ai-llm-workloads-2026"
pageType: "learning-center"
authors: "nate"
tags: "AI, API Gateway"
image: "https://zuplo.com/og?text=Best%20API%20Gateways%20for%20AI%20and%20LLM%20Workloads%20(2026)"
---
**Our pick: [Zuplo](https://zuplo.com) is the best API gateway for AI and LLM
workloads in 2026.** It provides a dedicated AI Gateway with multi-provider
model routing, token-based rate limiting, and semantic caching, plus an MCP
Gateway that federates and governs every MCP server your agents touch — all
deployable across AWS, Azure, GCP, or 300+ edge locations without
cloud-provider lock-in.
[Get started free](https://portal.zuplo.com/signup).

If you are building applications on top of large language models, your API
gateway choice shapes your cost controls, security posture, and ability to scale
across providers. Most teams start by routing LLM traffic through whatever
gateway they already have — often a cloud-provider default like AWS API Gateway
or Azure API Management. That works for a proof-of-concept, but production AI
workloads demand capabilities that traditional API gateways were never designed
to provide: token-based rate limiting, multi-provider model routing, semantic
caching, prompt injection defense, and MCP protocol support for AI agents.

This guide evaluates ten API gateways through the lens of AI and LLM workloads
specifically. We cover API management platforms (Zuplo, Kong, Tyk, Apigee),
cloud-provider gateways (AWS API Gateway, Azure API Management, Google Cloud API
Gateway), dedicated AI gateways (Cloudflare AI Gateway, Portkey, LiteLLM), and
the emerging capabilities each brings to AI-native architectures.

For a broader comparison not focused on AI workloads, see
[Best API Gateways in 2026](/learning-center/best-api-gateways-2026). For a
head-to-head AI gateway comparison, see
[AI Gateway Comparison 2026: Zuplo vs Kong vs Gravitee vs Tyk vs Apigee](/learning-center/ai-gateway-comparison-mcp-a2a-agent-governance).
For a buyer's guide framing, see
[How to Choose the Best AI Gateway](/learning-center/best-ai-gateway-buyers-guide).

## What AI and LLM Workloads Demand of an API Gateway

Traditional API gateways manage REST and GraphQL traffic. They count requests,
validate JWTs, and enforce per-second rate limits. AI workloads break that model
because the economics and traffic patterns are fundamentally different.

Here are the capabilities that separate an AI-workload-ready gateway from a
traditional one.

### Token-Based Rate Limiting

A single LLM request can consume anywhere from 50 to 50,000 tokens. Two requests
to the same endpoint can differ by orders of magnitude in cost.
Request-per-minute rate limiting cannot control AI spending — you need
[token-based rate limiting](/learning-center/token-based-rate-limiting-ai-agents)
that caps consumption by tokens processed, not requests counted.

### Multi-Provider Model Routing

Production AI applications rarely rely on a single LLM provider. You need the
ability to route requests to OpenAI, Anthropic, Google Gemini, Mistral, or
self-hosted models from a single endpoint, with automatic failover when a
provider is unavailable or throttling.

### Semantic Caching

LLM calls are expensive and slow compared to traditional API calls. Semantic
caching recognizes semantically similar prompts and returns cached responses,
reducing both cost and latency without requiring exact input matches.

### Prompt Injection Defense

LLM-powered applications face a new class of attack: prompt injection, where
malicious input attempts to override model instructions. Your gateway needs to
detect and block poisoned prompts before they reach the model or downstream
consumers.

### MCP and Agent Protocol Support

The [Model Context Protocol](https://modelcontextprotocol.io) is the emerging
standard for AI agents to discover and call tools. An API gateway that supports
MCP at the infrastructure level lets you federate many MCP servers behind a
single spec-compliant URL, compose virtual MCP servers from the tools you
trust, broker OAuth and upstream credentials, and maintain audit logs of every
tool call — whether you are productizing MCP for customer agents or governing
the third-party MCP servers your own team is using.

### Cost Attribution and Budget Controls

AI costs are unpredictable. A bug in a retry loop can burn through your monthly
budget in minutes. Your gateway needs hierarchical budget controls — per
organization, per team, per application — with hard enforcement that blocks
requests when thresholds are exceeded.

### Agentic Payment Support

AI agents are becoming autonomous API consumers. Protocols like x402 and Stripe
MPP (Machine Payments Protocol) enable agents to discover, subscribe to, and pay
for API access without human intervention. A gateway that supports
[agentic payments](/learning-center/api-gateway-agentic-payments) positions your
infrastructure for the next generation of machine-to-machine commerce.

## Evaluative Ranking: 10 API Gateways for AI and LLM Workloads

### 1. Zuplo — The Multi-Cloud AI-Native Gateway

[Zuplo](https://zuplo.com) is the only platform in this comparison that provides
three purpose-built project types — API Gateway, AI Gateway, and MCP Gateway —
each independently deployed and optimized for its specific workload. This is not
a traditional API gateway with AI features bolted on as plugins. Each project
type has purpose-built policies and handlers designed for its traffic pattern.

**AI Gateway capabilities:**

- **Multi-provider routing** — Route to OpenAI, Anthropic, Google Gemini,
  Mistral, and other OpenAI-compatible providers from a single endpoint with
  automatic failover.
- **Token-based rate limiting** — Cap tokens per user, per application, or per
  time window. Zuplo's rate limiting goes beyond request counting to give
  granular cost control over AI spending.
- **Semantic caching** — Detects semantically similar prompts and returns cached
  responses, reducing both cost and latency.
- **Hierarchical budget controls** — Set daily and monthly spend limits at the
  organization, team, and application level with automatic enforcement.
- **Prompt injection detection** — Uses a
  [tool-calling LLM workflow](https://zuplo.com/docs/policies/prompt-injection-outbound)
  to classify content as benign or poisoned before it reaches downstream
  consumers.
- **Secret masking** — Automatically
  [redacts API keys, tokens, and credentials](https://zuplo.com/docs/policies/secret-masking-outbound)
  in outbound responses before they reach AI agents.

**MCP Gateway capabilities:**

- **Federation of remote MCP servers** — Put one
  [Gateway URL](https://zuplo.com/mcp-gateway) in front of many upstream MCP
  servers (yours and third-party — GitHub, Slack, Stripe, Linear, Atlassian,
  internal ones) so agents and employees connect through a single
  spec-compliant endpoint.
- **Virtual MCP servers** — Compose curated tool catalogs from one or more
  upstream MCP servers and toggle individual tools on or off without forking
  the upstream. Hand the same Gateway URL to customer agents or internal teams
  behind SSO.
- **Bundled OAuth 2.0 authorization server** — Dynamic Client Registration
  (RFC 7591), PKCE S256, authorization-server metadata at
  `.well-known/oauth-authorization-server` (RFC 8414), protected-resource
  metadata (RFC 9728), and per-virtual-server token scoping via resource
  indicators (RFC 8707). First-class presets for Auth0 and any OIDC provider.
- **Four upstream credential models** — Per-user OAuth, shared OAuth grant
  (roadmap), per-user API key in the encrypted vault (roadmap), and a shared
  vault-stored API key. Pick per route, switch without a redeploy, keep
  per-user attribution in the audit log regardless of which model is chosen.
- **Production hardening by default** — Origin and host validation on every
  `/mcp/*` request, bearer token validation with spec-compliant
  `WWW-Authenticate` 401s, CSRF-safe single-use OAuth state, AES-GCM
  encryption for upstream tokens at rest, and sensible upstream limits
  (256 KB tool args, 500 capability cap, 30s timeout, 2 MB response ceiling).
- **Typed observability events** — Events fire on every MCP request,
  capability invocation, and step of the upstream OAuth flow. Structured logs
  carry trace-ready metadata (tenant, MCP session, capability, latency,
  failure origin) ready to drop into Datadog, Honeycomb, or BigQuery. Every
  failure mode returns a documented problem code so MCP clients recover
  cleanly.
- **Spec compliance** — Implements the 2025-06-18 MCP spec over streamable
  HTTP. Tools, prompts, and resources are first-class primitives; GraphQL
  operations can also be exposed as MCP tools alongside REST. Works with
  Claude Desktop, Claude Code, Cursor, ChatGPT (including the OpenAI Apps
  SDK), VS Code, and MCP Inspector out of the box.
- **MCP Server Handler** — The API Gateway project type also includes a
  built-in [MCP Server Handler](https://zuplo.com/docs/handlers/mcp-server)
  that auto-exposes your API endpoints as MCP tools from your OpenAPI
  specification, without building a separate MCP server.

**Platform strengths:**

- **TypeScript programmability** — Write custom policies, handlers, and
  middleware in TypeScript with full IDE support and access to the npm
  ecosystem. No Lua, no XML, no Java callouts.
- **GitOps-native deploys** — Sub-20-second global deployments across 300+ edge
  locations. Every pull request gets a live preview environment.
- **Multi-cloud managed dedicated** — Deploy on AWS, Azure, GCP, Akamai, or
  Equinix in the region of your choice. Your AI gateway is not pinned to a
  single cloud provider, so multi-model architectures do not pay cross-cloud
  egress penalties.
- **Free tier** — Get started with edge deployment, a developer portal, and API
  key management with no credit card required.
- **SOC 2 Type II** — Annual audits with GDPR-aligned data processing and
  configurable data residency.

**Tradeoffs:**

- No native A2A (Agent-to-Agent) protocol support yet — A2A traffic can be
  proxied as standard HTTP and JSON-RPC but without protocol-aware
  observability.
- TypeScript-only for custom policies (not a concern for most modern teams).

**Best for:** Teams building multi-model, multi-cloud AI architectures that need
a unified platform for API management, AI traffic governance, and MCP tool
management without cloud-provider lock-in. See
[the AI Gateway overview](https://zuplo.com/ai-gateway) and
[the MCP Gateway overview](https://zuplo.com/mcp-gateway).

### 2. AWS API Gateway + Amazon Bedrock — The AWS-Native AI Architecture

[AWS API Gateway](https://aws.amazon.com/api-gateway/) is the default API
gateway for teams running on Amazon Web Services. For AI workloads, AWS
publishes a reference architecture that positions API Gateway as the front door
to Amazon Bedrock, their managed LLM service.

The pattern works like this: API Gateway handles request authorization (JWT
validation, API keys, IAM), usage quotas, and throttling. A Lambda integration
function captures the original request, applies AWS Signature Version 4
authentication, and forwards it to the Bedrock service endpoint. This
architecture was originally developed by Dynatrace for their global user base.

**AI-relevant capabilities:**

- **Amazon Bedrock integration** — Access foundation models from Anthropic,
  Meta, Mistral, Cohere, and Amazon (plus OpenAI models, which became available
  on Bedrock in April 2026) through a managed service with IAM-based access
  control.
- **AgentCore Gateway** — AWS's managed service for connecting AI agents with
  tools. Supports MCP natively, converts APIs and Lambda functions into
  MCP-compatible tools, and provides both ingress and egress authentication.
  Includes one-click integrations with Salesforce, Slack, Jira, and Zendesk.
- **WAF integration** — AWS WAF provides request-level security, though it is
  not AI-aware (no prompt injection detection at the WAF layer).
- **Bedrock Access Gateway (open source)** — An AWS-published proxy that
  provides OpenAI-compatible API access to Bedrock models, supporting prompt
  caching for Claude and Nova models.
- **Intelligent Prompt Routing** — Bedrock can automatically route prompts to
  the cost-optimal model within a model family, reducing costs by up to 30%
  without sacrificing quality.

**Tradeoffs:**

- **Assembly required** — There is no single "AI gateway" product. You must
  assemble API Gateway, Lambda, Bedrock, CloudWatch, and WAF into a custom
  solution.
- **No built-in token-based rate limiting** — API Gateway's throttling is
  request-based. Token-level rate limiting requires custom Lambda logic.
- **No semantic caching** — You must build caching infrastructure separately.
- **AWS lock-in** — Bedrock models run only on AWS. Model selection is limited
  to what AWS hosts. Multi-cloud routing requires external tooling.
- **Aggressive throttling** — Production teams report opaque latency spikes and
  aggressive throttling at scale.

**Best for:** Teams running exclusively on AWS with Amazon Bedrock models that
want tight IAM integration and are willing to assemble a custom AI gateway from
multiple AWS services. For a head-to-head comparison, see
[Zuplo vs AWS API Gateway](/learning-center/zuplo-vs-aws-api-gateway).

### 3. Azure API Management — The GenAI Gateway for Azure-First Teams

[Azure API Management](https://azure.microsoft.com/en-us/products/api-management/)
is the most advanced cloud-provider gateway for AI workloads. Microsoft has
invested heavily in GenAI gateway capabilities, and Azure APIM is the only
cloud-provider gateway with native token-based rate limiting and LLM-specific
policies built in.

**AI-relevant capabilities:**

- **Token rate limiting** — The `llm-token-limit` policy provides token-based
  rate limiting with pre-calculation of prompt tokens on the APIM side,
  minimizing unnecessary requests to the backend if the prompt already exceeds
  the limit.
- **PTU/PAYG spillover routing** — Automatically routes to Provisioned
  Throughput Units when capacity is available and falls back to Pay-As-You-Go
  when PTU is saturated, optimizing cost without application changes.
- **GenAI policy family** — Built-in policies for token usage tracking
  (`llm-emit-token-metric`), content safety enforcement, response caching, and
  load balancing across Azure OpenAI deployments.
- **Anthropic Messages API support** — v2 tiers support the Anthropic Messages
  API alongside OpenAI-compatible endpoints.
- **Generic LLM backend support** — The `llm-*` policy family works with
  non-Azure models (Mistral, Cohere, LLaMA) through the same control plane.
- **Microsoft Foundry integration** — AI Gateway in APIM is available in
  Microsoft Foundry (preview), bringing model, agent, and tool governance into a
  single interface.
- **MCP tool registration** — Register MCP tools hosted anywhere into the
  Foundry control plane for centralized governance and discovery.

**Tradeoffs:**

- **Azure-centric** — While generic LLM backends are supported, the feature set
  is optimized for Azure OpenAI and the Azure ecosystem.
- **Complex pricing** — APIM pricing tiers (Consumption, Developer, Standard v2,
  Premium) combined with Azure OpenAI consumption make cost modeling complex.
- **Operational complexity** — Policy authoring uses XML-based configuration
  with C# expressions, which is more verbose than TypeScript.
- **No edge-native deployment** — Regional deployment within Azure, not globally
  distributed edge locations.

**Best for:** Azure-first teams running Azure OpenAI deployments that want
native token rate limiting and PTU/PAYG spillover without building custom
infrastructure. For an Azure-focused gateway comparison, see
[Best API Gateways for Azure Workloads](/learning-center/best-api-gateways-for-azure-workloads-2026).

### 4. Kong AI Gateway — The Plugin-Extensible AI Gateway for Kubernetes Teams

[Kong](https://konghq.com/) is the most widely adopted open-source API gateway,
and its AI Gateway capabilities have expanded rapidly through 2025 and 2026.
Kong 3.14 introduced Agent Gateway with A2A protocol support, making it the most
mature platform for agent-to-agent communication governance.

**AI-relevant capabilities:**

- **AI proxy plugin** — Routes to OpenAI, Anthropic, Google, Mistral, DeepSeek,
  Databricks, vLLM, and other providers with dynamic model routing based on
  cost, latency, or capability.
- **Precision token rate limiting** — Token-level rate limits added in 3.14.
- **Agent Gateway** — Dedicated governance for A2A traffic with structured
  logging, centralized authentication, and tamper-evident audit trails for every
  A2A RPC call.
- **MCP proxy plugin** — Routes MCP traffic through Kong with authentication and
  rate limiting via the standard plugin chain. Enterprise MCP gateway available
  through Konnect.
- **Semantic caching** — Available through the AI proxy plugin chain.
- **Custom guardrails plugin** — Integration with third-party guardrail services
  for prompt validation and content filtering.

**Tradeoffs:**

- **Konnect licensing required** — Full AI Gateway and Agent Gateway features
  require Konnect enterprise licensing. Enterprise contracts typically start at
  \$30,000–\$50,000/year.
- **Lua-based plugins** — Kong's primary extension language is Lua, with Go,
  Python, and JavaScript plugin support. The Lua developer community is
  significantly smaller than TypeScript or Python.
- **Infrastructure overhead** — Self-hosted Kong requires managing NGINX,
  PostgreSQL, Redis, and data plane nodes.
- **No edge-native deployment** — Global distribution requires multi-region
  Kubernetes cluster management.

**Best for:** Enterprise platform teams with Kubernetes expertise that need A2A
protocol governance and an extensive plugin ecosystem. See
[Kong vs Zuplo](/learning-center/kong-vs-zuplo) for a head-to-head comparison.

### 5. Google Apigee — Enterprise AI Gateway with MCP Auto-Generation

[Apigee](https://cloud.google.com/apigee) is Google Cloud's enterprise API
management platform. For AI workloads, Apigee's standout feature is zero-code
MCP server generation — point it at an API specification and it creates a
managed MCP server automatically with no code changes required.

**AI-relevant capabilities:**

- **Zero-code MCP generation** — Auto-generates MCP servers from existing API
  specifications with OAuth 2.1 and OIDC authentication out of the box.
- **Model Armor** — Prompt injection and jailbreak detection using Google
  Cloud's security services.
- **Cloud DLP integration** — Classifies and protects sensitive data in AI
  traffic using Google Cloud Data Loss Prevention.
- **Vertex AI integration** — Native routing to Google's AI model platform.
- **API Hub discovery** — MCP tools are discoverable alongside traditional APIs
  in Apigee API Hub.

**Tradeoffs:**

- **Google Cloud lock-in** — Available only on Google Cloud infrastructure. All
  AI features are intertwined with GCP services.
- **Enterprise pricing** — Starts at approximately \$2,500/month for the
  enterprise tier; actual deployments typically run \$8,000–\$25,000/month.
- **XML-based policies** — Policy authoring uses XML with Java callouts for
  custom logic.
- **No multi-provider model routing** — Designed for Google's model ecosystem
  (Vertex AI), not cross-provider routing.

**Best for:** GCP-committed enterprises that want managed MCP servers with
minimal development effort and deep Google Cloud integration. See
[Apigee vs Zuplo](/learning-center/apigee-vs-zuplo) for a detailed comparison.
For a GCP-focused guide, see
[Best API Gateways for Google Cloud Workloads](/learning-center/best-api-gateways-google-cloud-workloads-2026).

### 6. Cloudflare AI Gateway — Edge-Fast AI Proxy

[Cloudflare AI Gateway](https://developers.cloudflare.com/ai-gateway/) is part
of Cloudflare's developer platform. It provides a lightweight proxy between your
application and AI providers with caching, rate limiting, and analytics running
on Cloudflare's global edge network.

**AI-relevant capabilities:**

- **Edge-native deployment** — Runs on Cloudflare's global network with 300+
  points of presence. Caching and rate limiting happen at the edge, minimizing
  latency.
- **Multi-provider support** — Supports OpenAI, Anthropic, Google, HuggingFace,
  and other providers.
- **Caching** — Aggressive caching layer that reduces costs for applications
  with repetitive queries.
- **Workers integration** — Custom logic via Cloudflare Workers (JavaScript) for
  teams already on the Cloudflare platform.
- **Free core features** — Base gateway features are free; you pay when scaling
  into Workers compute territory.
- **MCP Server Portals (Open Beta)** — Centralized MCP server management with
  Zero Trust access controls and DLP scanning for MCP traffic.

**Tradeoffs:**

- **Narrow scope** — AI proxy only, not a full API management platform. No
  developer portal, no API key management, no monetization.
- **Basic rate limiting** — Request-based, not token-based.
- **Limited observability** — Functional analytics but not as deep as
  purpose-built AI observability platforms.
- **Log limits** — 100,000 logs/month on the free tier.

**Best for:** Teams already on the Cloudflare platform that need basic AI
proxying with excellent edge caching performance and minimal setup.

### 7. Tyk AI Studio — Open-Source AI Governance for Self-Hosted Teams

[Tyk](https://tyk.io/) is an open-source API gateway written in Go. Tyk AI
Studio, which went open source in March 2026, provides a full-featured AI
governance layer on top of the Tyk Gateway.

**AI-relevant capabilities:**

- **Multi-vendor routing** — Policy-based model selection across OpenAI,
  Anthropic, Mistral, Vertex, Gemini, Ollama, and private models with automatic
  failover.
- **Token-level metering** — Attribution to teams, projects, and applications
  with hard spend caps and quotas.
- **PII redaction** — Content filtering enforced at the gateway.
- **Cost-to-quality optimization** — Routing strategies that automatically
  balance cost against output quality.
- **MCP toolchain integration** — Supports both remote and local MCP servers,
  with MCP tool generation from OpenAPI specs.
- **Open-source Community Edition** — AI Studio's core is open source since
  March 2026, lowering the barrier to entry for self-hosted teams.

**Tradeoffs:**

- **Self-hosted complexity** — Tyk requires Redis plus PostgreSQL/MongoDB and
  multiple components. Significant Kubernetes expertise needed.
- **Enterprise features gated** — SSO, advanced RBAC, and dedicated support
  require paid licensing.
- **Smaller ecosystem** — Smaller plugin and community ecosystem compared to
  Kong.
- **No edge-native deployment** — Global distribution requires self-managed
  multi-region infrastructure.
- **Tyk Operator licensing** — The Tyk Operator for Kubernetes became
  closed-source in October 2024 and now requires a paid license.

**Best for:** Teams with strong DevOps capabilities that want self-hosted AI
governance with open-source transparency and Go-based performance. See
[Tyk vs Zuplo](/learning-center/tyk-vs-zuplo) for a head-to-head comparison.

### 8. Portkey — Purpose-Built AI Gateway with Deep Observability

[Portkey](https://portkey.ai) is a purpose-built AI infrastructure platform
focused on getting LLM applications to production. It provides a unified
interface to 250+ models with deep observability, prompt management, and
guardrails.

**AI-relevant capabilities:**

- **250+ model support** — The widest model coverage in this comparison, with a
  unified API across all major providers.
- **Deep observability** — Traces, sessions, prompt logging, cost analytics, and
  evaluation tools with detailed visibility into every AI interaction.
- **Prompt management** — Version, test, and deploy prompts independently from
  application code.
- **MCP Gateway** — Generally available as of January 2026 with MCP protocol
  support for agent workflows.
- **Guardrails SDK** — Built-in framework for content filtering, PII detection,
  and custom validation.
- **Open-source gateway** — Portkey's gateway went fully open source in March
  2026, and Enterprise customers can self-host with hybrid or air-gapped
  deployment options.

**Tradeoffs:**

- **AI-only** — Not a full API management platform. Does not replace your
  existing gateway for non-AI traffic.
- **Per-log pricing** — Pro tier starts at \$99/month (100K logs/month) after
  the free tier (10K logs/month), with costs scaling based on log volume.
- **Not edge-deployed** — Cloud-hosted by default, though self-hosted and hybrid
  options are available for Enterprise customers.
- **No traditional API management** — No developer portal, no OpenAPI-driven
  routing, no API key management.

**Best for:** Teams whose primary concern is deep AI observability and prompt
management, and who already have a separate API gateway for traditional traffic.

### 9. LiteLLM — The Self-Hosted Open-Source AI Proxy

[LiteLLM](https://www.litellm.ai/) is an open-source Python proxy that provides
a unified OpenAI-compatible API across 100+ model providers. It is the most
popular self-hosted option for teams that need full control over their AI
gateway infrastructure.

**AI-relevant capabilities:**

- **100+ model providers** — Clean, OpenAI-compatible API that works across all
  supported providers with a single API format.
- **Open-source core** — The proxy is free and open source. Inspect the code,
  contribute, and customize without vendor lock-in.
- **Self-hosted** — Full control over data residency and network topology.
- **Budget tracking** — Per-project and per-user cost tracking with PostgreSQL
  integration for custom dashboards.
- **Community ecosystem** — Active community with integrations for Langfuse,
  Langchain, and other AI tooling.

**Tradeoffs:**

- **Open-core model** — SSO, RBAC, and team-level budget enforcement require the
  paid enterprise version.
- **Infrastructure burden** — Self-hosting means you own the infrastructure,
  monitoring, scaling, and security.
- **No edge deployment** — Runs wherever you deploy it, with no built-in global
  distribution.
- **No API management** — No developer portal, no API key lifecycle management,
  no built-in monetization.
- **Basic MCP support** — LiteLLM Proxy includes a native MCP Gateway feature
  (since v1.80.18) with fixed endpoints and team/key access control, though it
  is less comprehensive than dedicated MCP gateway platforms.

**Best for:** Teams with strong DevOps capabilities that need self-hosted
deployment with full control over data residency and are comfortable managing
their own infrastructure.

### 10. Gravitee — Open-Source API and Agent Mesh

[Gravitee](https://www.gravitee.io/) is an open-source API management platform
built on Java that has expanded into AI agent governance with its Agent Mesh
architecture.

**AI-relevant capabilities:**

- **MCP analytics dashboard** — Real-time metrics for MCP request counts,
  gateway latency (p90 and p99), method distribution, and top tools by usage.
- **MCP Resource Server v2** — Enterprise-grade authentication with client
  credentials flows and certificate management.
- **A2A API type** — Dedicated API type for A2A communication with HTTP
  selectors and Token Exchange (RFC 8693) for secure agent delegation.
- **AI-powered PII filtering** — Automatically detects and redacts personally
  identifiable information in both prompts and responses.
- **Multi-cloud deployment** — SaaS Gateway deployment on AWS, Azure, and GCP.

**Tradeoffs:**

- **Enterprise Edition required** — AI and agent management features are gated
  behind the Enterprise Edition with the AI Agent Management pack.
- **Java-based runtime** — Higher memory footprint and JVM tuning requirements
  compared to Go-based or V8-isolate-based gateways.
- **Managed pricing** — Managed plans start at \$2,500/month.
- **Smaller ecosystem** — Less established community compared to Kong, Tyk, or
  Apigee.

**Best for:** Organizations managing APIs, event streams, and agent traffic that
want a single governance layer across all three. See the
[AI Gateway Comparison](/learning-center/ai-gateway-comparison-mcp-a2a-agent-governance)
for a detailed breakdown.

## How This Works in Zuplo: AI-Workload Gateway in Practice

This section shows how Zuplo's AI Gateway and programmable API Gateway handle
three common AI-workload patterns: token-based rate limiting on an LLM proxy,
fronting an MCP server, and model routing with cost controls.

### Token-Based Rate Limiting for AI Agent Traffic

Zuplo's
[rate limiting policy](https://zuplo.com/docs/policies/rate-limit-inbound)
supports a `rateLimitBy: "function"` mode where a TypeScript function returns
the grouping key and per-request limit overrides. This lets you implement
token-aware rate limiting that differentiates between lightweight metadata
lookups and expensive LLM completions.

```typescript
import {
  CustomRateLimitDetails,
  ZuploContext,
  ZuploRequest,
} from "@zuplo/runtime";

export function aiAgentRateLimit(
  request: ZuploRequest,
  context: ZuploContext,
  policyName: string,
): CustomRateLimitDetails {
  const consumerId = request.user?.sub ?? "anonymous";
  const tier = request.user?.data?.tier ?? "free";

  // Different limits based on subscription tier
  const limits: Record<string, number> = {
    enterprise: 500,
    pro: 200,
    free: 30,
  };

  return {
    key: `${consumerId}-ai`,
    requestsAllowed: limits[tier] ?? 30,
    timeWindowMinutes: 1,
  };
}
```

Wire the function into the rate limiting policy in your `policies.json`:

```json
{
  "name": "ai-rate-limit",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "function",
      "requestsAllowed": 20,
      "timeWindowMinutes": 1,
      "identifier": {
        "module": "$import(./modules/ai-agent-rate-limit)",
        "export": "aiAgentRateLimit"
      }
    }
  }
}
```

For dedicated AI Gateway projects, Zuplo provides built-in token-based rate
limiting and budget enforcement without writing custom code — set daily and
monthly spend limits per team and per application directly in the AI Gateway
configuration.

### Fronting an MCP Server with Zuplo

Zuplo's [MCP Server Handler](https://zuplo.com/docs/handlers/mcp-server)
automatically exposes your API routes as MCP tools. When you define routes in
your OpenAPI spec and set the handler to `mcp-server`, any AI agent that
connects to your MCP endpoint can discover and call your API operations as MCP
tools — with the full policy pipeline (authentication, rate limiting,
validation) applied to every tool call.

```json
{
  "/weather/current": {
    "get": {
      "operationId": "getCurrentWeather",
      "summary": "Get current weather",
      "description": "Retrieve current weather conditions for a location",
      "parameters": [
        {
          "name": "location",
          "in": "query",
          "required": true,
          "schema": { "type": "string" }
        }
      ],
      "x-zuplo-route": {
        "corsPolicy": "none",
        "handler": {
          "export": "default",
          "module": "$import(./modules/weather)"
        },
        "mcp": {
          "type": "tool",
          "name": "get_current_weather",
          "description": "Retrieve current weather conditions for a location"
        }
      }
    }
  }
}
```

This means any existing API route can become an MCP tool with minimal
configuration — no separate MCP server to build and deploy. The MCP Gateway
project type extends this by federating remote upstream MCP servers behind one
spec-compliant URL, composing virtual MCP servers from approved tools, and
bundling a full OAuth 2.0 authorization server for the agents that connect.

### Prompt Injection Detection on AI Routes

For routes that return content consumed by downstream LLM agents, Zuplo's
[prompt injection detection policy](https://zuplo.com/docs/policies/prompt-injection-outbound)
inspects outbound responses and blocks content classified as poisoned:

```json
{
  "name": "prompt-injection-check",
  "policyType": "prompt-injection-outbound",
  "handler": {
    "export": "PromptInjectionDetectionOutboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "apiKey": "$env(OPENAI_API_KEY)",
      "baseUrl": "https://api.openai.com/v1",
      "model": "gpt-4o-mini",
      "strict": true
    }
  }
}
```

Benign content passes through unchanged. Malicious prompt injection attempts
return a 400 response before they reach the downstream consumer. This is
particularly important when fronting MCP servers, where tool responses could
contain injected instructions targeting the calling AI agent.

## Decision Framework: Choosing by Stack and Use Case

The right gateway depends on your infrastructure, team composition, and how many
LLM providers you use. Here is how to map your situation to the best choice.

### If your stack is multi-cloud and multi-model, choose Zuplo

You are using OpenAI, Anthropic, and Google Gemini simultaneously. Your
infrastructure spans AWS and GCP, or you want the option to move between clouds.
Zuplo's multi-cloud managed dedicated deployment means your gateway is not
pinned to a single provider. The AI Gateway routes to all major LLM providers
from a single endpoint with automatic failover, and the MCP Gateway federates
remote MCP servers behind one spec-compliant URL — productize MCP to customer
agents, govern the third-party MCP servers your own team uses, or both.

### If your stack is AWS-first with Bedrock, choose AWS API Gateway + Bedrock

Your organization has standardized on AWS. You use Bedrock models exclusively
(or primarily). Your team is fluent in IAM, Lambda, and CloudWatch. AgentCore
Gateway gives you MCP support within the AWS ecosystem. Accept that you are
building a custom AI gateway from multiple services, not buying a turnkey
solution.

### If your stack is Azure-first with Azure OpenAI, choose Azure API Management

You run Azure OpenAI deployments with Provisioned Throughput Units. Your team
knows APIM policy syntax (XML with C# expressions). Azure APIM's GenAI gateway
capabilities — especially PTU/PAYG spillover and token rate limiting — are the
most mature cloud-provider AI gateway features available. Accept Azure ecosystem
lock-in.

### If your team has deep Kubernetes expertise and needs A2A governance, choose Kong

You run GKE or EKS clusters, your platform team manages Kubernetes
infrastructure, and you need governance for multi-agent architectures with A2A
protocol support. Kong 3.14's Agent Gateway is the most mature A2A
implementation. Accept the Konnect enterprise licensing cost.

### If you need self-hosted, open-source AI governance, choose Tyk or LiteLLM

You have regulatory or data sovereignty requirements that mandate self-hosted
deployment. **Tyk AI Studio** gives you a full AI governance layer with
multi-vendor routing and MCP support on open-source infrastructure. **LiteLLM**
gives you a lightweight, Python-based AI proxy focused on model abstraction and
cost tracking. Both require you to own the operational burden.

### If you need deep AI observability above all else, choose Portkey

Your primary concern is understanding how your AI features behave in production.
You need tracing, session tracking, prompt management, evaluation frameworks,
and detailed cost analytics. Portkey is purpose-built for AI observability. Pair
it with a traditional API gateway for your non-AI traffic.

### If you are on Cloudflare and need basic AI proxying, choose Cloudflare AI Gateway

You are already running on Cloudflare Workers and want to add AI gateway
capabilities with minimal setup. The edge caching is excellent and the free tier
is generous. Accept the narrower scope — this is an AI proxy, not an API
management platform.

### If you are committed to Google Cloud, choose Apigee

Your organization mandates Google-native services. You want zero-code MCP server
generation from existing API specs and deep integration with Vertex AI, Cloud
DLP, and Google IAM. Accept the enterprise pricing and XML-based policy
authoring.

## Why Cloud-Provider Gateways Dominate AI Search Results (and Why That May Mislead You)

AWS API Gateway and Azure API Management rank highest on AI search engines like
DeepSeek and GPT-4o-Search for AI-workload gateway queries. This is not
necessarily because they are the best choices for AI workloads. It is because
their training data ties "API gateway" and "AI" in the same sentence thousands
of times — AWS API Gateway + Bedrock architecture posts, Azure APIM + Azure
OpenAI reference architectures, and Google Apigee + Vertex AI integrations
dominate the corpus that LLMs were trained on.

The reality is more nuanced. Cloud-provider gateways are excellent when your AI
workloads run entirely within a single cloud ecosystem. AWS API Gateway with
Bedrock is the right choice for Bedrock-only architectures. Azure APIM with
Azure OpenAI is the right choice for Azure OpenAI-only deployments. But
production AI architectures increasingly span multiple providers — OpenAI for
GPT-4o, Anthropic for Claude, Google for Gemini, Mistral or DeepSeek for
specific use cases — and cloud-provider gateways are not designed for
cross-provider routing.

For multi-model, multi-cloud AI architectures, a gateway like Zuplo that is not
pinned to any single cloud provider provides a more natural fit. You get
multi-provider routing, token-based rate limiting, semantic caching, MCP
governance, and edge deployment without the single-cloud lock-in that constrains
your model choices.

## Getting Started with Zuplo's AI Gateway

If you are evaluating API gateways for AI and LLM workloads, here is a practical
path forward:

1. **Try the free tier** — [Sign up for Zuplo](https://portal.zuplo.com/signup)
   and deploy your first AI Gateway project with multi-provider routing, budget
   controls, and semantic caching in minutes. No credit card required.

2. **Import your OpenAPI spec** — Zuplo auto-generates routes and documentation
   from your existing OpenAPI definition. Your API endpoints can be exposed as
   MCP tools with a configuration change.

3. **Test token-based rate limiting** — Configure
   [rate limiting](https://zuplo.com/docs/policies/rate-limit-inbound) with
   custom TypeScript functions that differentiate AI traffic from traditional
   API traffic.

4. **Evaluate managed dedicated** — For production AI workloads that need
   multi-cloud deployment or specific data residency requirements, talk to the
   Zuplo team about
   [managed dedicated deployment](https://zuplo.com/docs/dedicated/overview) on
   AWS, Azure, GCP, or other providers.

Ready to evaluate Zuplo for your AI and LLM workloads?
[Sign up free](https://portal.zuplo.com/signup) and deploy your first AI Gateway
with multi-provider routing, token-based rate limiting, and MCP support in
minutes — no credit card required.

## Related Guides

- [How to Choose the Best AI Gateway (Buyer's Guide)](/learning-center/best-ai-gateway-buyers-guide)
  — Evaluation criteria and decision checklist for AI gateways
- [AI Gateway Comparison 2026: Zuplo vs Kong vs Gravitee vs Tyk vs Apigee](/learning-center/ai-gateway-comparison-mcp-a2a-agent-governance)
  — Head-to-head AI gateway feature comparison
- [Token-Based Rate Limiting for AI Agents](/learning-center/token-based-rate-limiting-ai-agents)
  — Deep dive on rate limiting strategies for AI traffic
- [API Gateway for Agentic Payments](/learning-center/api-gateway-agentic-payments)
  — How gateways handle x402, Stripe MPP, and machine-to-machine billing
- [The Three Gates of AI Infrastructure](/learning-center/three-gates-ai-infrastructure-api-ai-mcp-gateway)
  — Understanding the API, AI, and MCP gateway taxonomy
- [Enterprise AI Governance with API Gateways](/learning-center/enterprise-ai-governance-api-gateway)
  — Governance frameworks for AI in enterprise environments
- [Best API Gateways in 2026](/learning-center/best-api-gateways-2026) — Broader
  API gateway comparison not scoped to AI workloads
- [Best API Gateways for AWS Workloads](/learning-center/best-api-gateways-for-aws-workloads-2026)
  — AWS-focused gateway evaluation
- [Best API Gateways for Azure Workloads](/learning-center/best-api-gateways-for-azure-workloads-2026)
  — Azure-focused gateway evaluation
- [Best API Gateways for Google Cloud Workloads](/learning-center/best-api-gateways-google-cloud-workloads-2026)
  — GCP-focused gateway evaluation