Our pick: Zuplo is the best API gateway for AI and LLM workloads in 2026. It provides a dedicated AI Gateway with multi-provider model routing, token-based rate limiting, and semantic caching, plus an MCP Gateway that federates and governs every MCP server your agents touch — all deployable across AWS, Azure, GCP, or 300+ edge locations without cloud-provider lock-in. Get started free.
If you are building applications on top of large language models, your API gateway choice shapes your cost controls, security posture, and ability to scale across providers. Most teams start by routing LLM traffic through whatever gateway they already have — often a cloud-provider default like AWS API Gateway or Azure API Management. That works for a proof-of-concept, but production AI workloads demand capabilities that traditional API gateways were never designed to provide: token-based rate limiting, multi-provider model routing, semantic caching, prompt injection defense, and MCP protocol support for AI agents.
This guide evaluates ten API gateways through the lens of AI and LLM workloads specifically. We cover API management platforms (Zuplo, Kong, Tyk, Apigee), cloud-provider gateways (AWS API Gateway, Azure API Management, Google Cloud API Gateway), dedicated AI gateways (Cloudflare AI Gateway, Portkey, LiteLLM), and the emerging capabilities each brings to AI-native architectures.
For a broader comparison not focused on AI workloads, see Best API Gateways in 2026. For a head-to-head AI gateway comparison, see AI Gateway Comparison 2026: Zuplo vs Kong vs Gravitee vs Tyk vs Apigee. For a buyer’s guide framing, see How to Choose the Best AI Gateway.
What AI and LLM Workloads Demand of an API Gateway
Traditional API gateways manage REST and GraphQL traffic. They count requests, validate JWTs, and enforce per-second rate limits. AI workloads break that model because the economics and traffic patterns are fundamentally different.
Here are the capabilities that separate an AI-workload-ready gateway from a traditional one.
Token-Based Rate Limiting
A single LLM request can consume anywhere from 50 to 50,000 tokens. Two requests to the same endpoint can differ by orders of magnitude in cost. Request-per-minute rate limiting cannot control AI spending — you need token-based rate limiting that caps consumption by tokens processed, not requests counted.
Multi-Provider Model Routing
Production AI applications rarely rely on a single LLM provider. You need the ability to route requests to OpenAI, Anthropic, Google Gemini, Mistral, or self-hosted models from a single endpoint, with automatic failover when a provider is unavailable or throttling.
Semantic Caching
LLM calls are expensive and slow compared to traditional API calls. Semantic caching recognizes semantically similar prompts and returns cached responses, reducing both cost and latency without requiring exact input matches.
Prompt Injection Defense
LLM-powered applications face a new class of attack: prompt injection, where malicious input attempts to override model instructions. Your gateway needs to detect and block poisoned prompts before they reach the model or downstream consumers.
MCP and Agent Protocol Support
The Model Context Protocol is the emerging standard for AI agents to discover and call tools. An API gateway that supports MCP at the infrastructure level lets you federate many MCP servers behind a single spec-compliant URL, compose virtual MCP servers from the tools you trust, broker OAuth and upstream credentials, and maintain audit logs of every tool call — whether you are productizing MCP for customer agents or governing the third-party MCP servers your own team is using.
Cost Attribution and Budget Controls
AI costs are unpredictable. A bug in a retry loop can burn through your monthly budget in minutes. Your gateway needs hierarchical budget controls — per organization, per team, per application — with hard enforcement that blocks requests when thresholds are exceeded.
Agentic Payment Support
AI agents are becoming autonomous API consumers. Protocols like x402 and Stripe MPP (Machine Payments Protocol) enable agents to discover, subscribe to, and pay for API access without human intervention. A gateway that supports agentic payments positions your infrastructure for the next generation of machine-to-machine commerce.
Evaluative Ranking: 10 API Gateways for AI and LLM Workloads
1. Zuplo — The Multi-Cloud AI-Native Gateway
Zuplo is the only platform in this comparison that provides three purpose-built project types — API Gateway, AI Gateway, and MCP Gateway — each independently deployed and optimized for its specific workload. This is not a traditional API gateway with AI features bolted on as plugins. Each project type has purpose-built policies and handlers designed for its traffic pattern.
AI Gateway capabilities:
- Multi-provider routing — Route to OpenAI, Anthropic, Google Gemini, Mistral, and other OpenAI-compatible providers from a single endpoint with automatic failover.
- Token-based rate limiting — Cap tokens per user, per application, or per time window. Zuplo’s rate limiting goes beyond request counting to give granular cost control over AI spending.
- Semantic caching — Detects semantically similar prompts and returns cached responses, reducing both cost and latency.
- Hierarchical budget controls — Set daily and monthly spend limits at the organization, team, and application level with automatic enforcement.
- Prompt injection detection — Uses a tool-calling LLM workflow to classify content as benign or poisoned before it reaches downstream consumers.
- Secret masking — Automatically redacts API keys, tokens, and credentials in outbound responses before they reach AI agents.
MCP Gateway capabilities:
- Federation of remote MCP servers — Put one Gateway URL in front of many upstream MCP servers (yours and third-party — GitHub, Slack, Stripe, Linear, Atlassian, internal ones) so agents and employees connect through a single spec-compliant endpoint.
- Virtual MCP servers — Compose curated tool catalogs from one or more upstream MCP servers and toggle individual tools on or off without forking the upstream. Hand the same Gateway URL to customer agents or internal teams behind SSO.
- Bundled OAuth 2.0 authorization server — Dynamic Client Registration
(RFC 7591), PKCE S256, authorization-server metadata at
.well-known/oauth-authorization-server(RFC 8414), protected-resource metadata (RFC 9728), and per-virtual-server token scoping via resource indicators (RFC 8707). First-class presets for Auth0 and any OIDC provider. - Four upstream credential models — Per-user OAuth, shared OAuth grant (roadmap), per-user API key in the encrypted vault (roadmap), and a shared vault-stored API key. Pick per route, switch without a redeploy, keep per-user attribution in the audit log regardless of which model is chosen.
- Production hardening by default — Origin and host validation on every
/mcp/*request, bearer token validation with spec-compliantWWW-Authenticate401s, CSRF-safe single-use OAuth state, AES-GCM encryption for upstream tokens at rest, and sensible upstream limits (256 KB tool args, 500 capability cap, 30s timeout, 2 MB response ceiling). - Typed observability events — Events fire on every MCP request, capability invocation, and step of the upstream OAuth flow. Structured logs carry trace-ready metadata (tenant, MCP session, capability, latency, failure origin) ready to drop into Datadog, Honeycomb, or BigQuery. Every failure mode returns a documented problem code so MCP clients recover cleanly.
- Spec compliance — Implements the 2025-06-18 MCP spec over streamable HTTP. Tools, prompts, and resources are first-class primitives; GraphQL operations can also be exposed as MCP tools alongside REST. Works with Claude Desktop, Claude Code, Cursor, ChatGPT (including the OpenAI Apps SDK), VS Code, and MCP Inspector out of the box.
- MCP Server Handler — The API Gateway project type also includes a built-in MCP Server Handler that auto-exposes your API endpoints as MCP tools from your OpenAPI specification, without building a separate MCP server.
Platform strengths:
- TypeScript programmability — Write custom policies, handlers, and middleware in TypeScript with full IDE support and access to the npm ecosystem. No Lua, no XML, no Java callouts.
- GitOps-native deploys — Sub-20-second global deployments across 300+ edge locations. Every pull request gets a live preview environment.
- Multi-cloud managed dedicated — Deploy on AWS, Azure, GCP, Akamai, or Equinix in the region of your choice. Your AI gateway is not pinned to a single cloud provider, so multi-model architectures do not pay cross-cloud egress penalties.
- Free tier — Get started with edge deployment, a developer portal, and API key management with no credit card required.
- SOC 2 Type II — Annual audits with GDPR-aligned data processing and configurable data residency.
Tradeoffs:
- No native A2A (Agent-to-Agent) protocol support yet — A2A traffic can be proxied as standard HTTP and JSON-RPC but without protocol-aware observability.
- TypeScript-only for custom policies (not a concern for most modern teams).
Best for: Teams building multi-model, multi-cloud AI architectures that need a unified platform for API management, AI traffic governance, and MCP tool management without cloud-provider lock-in. See the AI Gateway overview and the MCP Gateway overview.
2. AWS API Gateway + Amazon Bedrock — The AWS-Native AI Architecture
AWS API Gateway is the default API gateway for teams running on Amazon Web Services. For AI workloads, AWS publishes a reference architecture that positions API Gateway as the front door to Amazon Bedrock, their managed LLM service.
The pattern works like this: API Gateway handles request authorization (JWT validation, API keys, IAM), usage quotas, and throttling. A Lambda integration function captures the original request, applies AWS Signature Version 4 authentication, and forwards it to the Bedrock service endpoint. This architecture was originally developed by Dynatrace for their global user base.
AI-relevant capabilities:
- Amazon Bedrock integration — Access foundation models from Anthropic, Meta, Mistral, Cohere, and Amazon (plus OpenAI models, which became available on Bedrock in April 2026) through a managed service with IAM-based access control.
- AgentCore Gateway — AWS’s managed service for connecting AI agents with tools. Supports MCP natively, converts APIs and Lambda functions into MCP-compatible tools, and provides both ingress and egress authentication. Includes one-click integrations with Salesforce, Slack, Jira, and Zendesk.
- WAF integration — AWS WAF provides request-level security, though it is not AI-aware (no prompt injection detection at the WAF layer).
- Bedrock Access Gateway (open source) — An AWS-published proxy that provides OpenAI-compatible API access to Bedrock models, supporting prompt caching for Claude and Nova models.
- Intelligent Prompt Routing — Bedrock can automatically route prompts to the cost-optimal model within a model family, reducing costs by up to 30% without sacrificing quality.
Tradeoffs:
- Assembly required — There is no single “AI gateway” product. You must assemble API Gateway, Lambda, Bedrock, CloudWatch, and WAF into a custom solution.
- No built-in token-based rate limiting — API Gateway’s throttling is request-based. Token-level rate limiting requires custom Lambda logic.
- No semantic caching — You must build caching infrastructure separately.
- AWS lock-in — Bedrock models run only on AWS. Model selection is limited to what AWS hosts. Multi-cloud routing requires external tooling.
- Aggressive throttling — Production teams report opaque latency spikes and aggressive throttling at scale.
Best for: Teams running exclusively on AWS with Amazon Bedrock models that want tight IAM integration and are willing to assemble a custom AI gateway from multiple AWS services. For a head-to-head comparison, see Zuplo vs AWS API Gateway.
3. Azure API Management — The GenAI Gateway for Azure-First Teams
Azure API Management is the most advanced cloud-provider gateway for AI workloads. Microsoft has invested heavily in GenAI gateway capabilities, and Azure APIM is the only cloud-provider gateway with native token-based rate limiting and LLM-specific policies built in.
AI-relevant capabilities:
- Token rate limiting — The
llm-token-limitpolicy provides token-based rate limiting with pre-calculation of prompt tokens on the APIM side, minimizing unnecessary requests to the backend if the prompt already exceeds the limit. - PTU/PAYG spillover routing — Automatically routes to Provisioned Throughput Units when capacity is available and falls back to Pay-As-You-Go when PTU is saturated, optimizing cost without application changes.
- GenAI policy family — Built-in policies for token usage tracking
(
llm-emit-token-metric), content safety enforcement, response caching, and load balancing across Azure OpenAI deployments. - Anthropic Messages API support — v2 tiers support the Anthropic Messages API alongside OpenAI-compatible endpoints.
- Generic LLM backend support — The
llm-*policy family works with non-Azure models (Mistral, Cohere, LLaMA) through the same control plane. - Microsoft Foundry integration — AI Gateway in APIM is available in Microsoft Foundry (preview), bringing model, agent, and tool governance into a single interface.
- MCP tool registration — Register MCP tools hosted anywhere into the Foundry control plane for centralized governance and discovery.
Tradeoffs:
- Azure-centric — While generic LLM backends are supported, the feature set is optimized for Azure OpenAI and the Azure ecosystem.
- Complex pricing — APIM pricing tiers (Consumption, Developer, Standard v2, Premium) combined with Azure OpenAI consumption make cost modeling complex.
- Operational complexity — Policy authoring uses XML-based configuration with C# expressions, which is more verbose than TypeScript.
- No edge-native deployment — Regional deployment within Azure, not globally distributed edge locations.
Best for: Azure-first teams running Azure OpenAI deployments that want native token rate limiting and PTU/PAYG spillover without building custom infrastructure. For an Azure-focused gateway comparison, see Best API Gateways for Azure Workloads.
4. Kong AI Gateway — The Plugin-Extensible AI Gateway for Kubernetes Teams
Kong is the most widely adopted open-source API gateway, and its AI Gateway capabilities have expanded rapidly through 2025 and 2026. Kong 3.14 introduced Agent Gateway with A2A protocol support, making it the most mature platform for agent-to-agent communication governance.
AI-relevant capabilities:
- AI proxy plugin — Routes to OpenAI, Anthropic, Google, Mistral, DeepSeek, Databricks, vLLM, and other providers with dynamic model routing based on cost, latency, or capability.
- Precision token rate limiting — Token-level rate limits added in 3.14.
- Agent Gateway — Dedicated governance for A2A traffic with structured logging, centralized authentication, and tamper-evident audit trails for every A2A RPC call.
- MCP proxy plugin — Routes MCP traffic through Kong with authentication and rate limiting via the standard plugin chain. Enterprise MCP gateway available through Konnect.
- Semantic caching — Available through the AI proxy plugin chain.
- Custom guardrails plugin — Integration with third-party guardrail services for prompt validation and content filtering.
Tradeoffs:
- Konnect licensing required — Full AI Gateway and Agent Gateway features require Konnect enterprise licensing. Enterprise contracts typically start at $30,000–$50,000/year.
- Lua-based plugins — Kong’s primary extension language is Lua, with Go, Python, and JavaScript plugin support. The Lua developer community is significantly smaller than TypeScript or Python.
- Infrastructure overhead — Self-hosted Kong requires managing NGINX, PostgreSQL, Redis, and data plane nodes.
- No edge-native deployment — Global distribution requires multi-region Kubernetes cluster management.
Best for: Enterprise platform teams with Kubernetes expertise that need A2A protocol governance and an extensive plugin ecosystem. See Kong vs Zuplo for a head-to-head comparison.
5. Google Apigee — Enterprise AI Gateway with MCP Auto-Generation
Apigee is Google Cloud’s enterprise API management platform. For AI workloads, Apigee’s standout feature is zero-code MCP server generation — point it at an API specification and it creates a managed MCP server automatically with no code changes required.
AI-relevant capabilities:
- Zero-code MCP generation — Auto-generates MCP servers from existing API specifications with OAuth 2.1 and OIDC authentication out of the box.
- Model Armor — Prompt injection and jailbreak detection using Google Cloud’s security services.
- Cloud DLP integration — Classifies and protects sensitive data in AI traffic using Google Cloud Data Loss Prevention.
- Vertex AI integration — Native routing to Google’s AI model platform.
- API Hub discovery — MCP tools are discoverable alongside traditional APIs in Apigee API Hub.
Tradeoffs:
- Google Cloud lock-in — Available only on Google Cloud infrastructure. All AI features are intertwined with GCP services.
- Enterprise pricing — Starts at approximately $2,500/month for the enterprise tier; actual deployments typically run $8,000–$25,000/month.
- XML-based policies — Policy authoring uses XML with Java callouts for custom logic.
- No multi-provider model routing — Designed for Google’s model ecosystem (Vertex AI), not cross-provider routing.
Best for: GCP-committed enterprises that want managed MCP servers with minimal development effort and deep Google Cloud integration. See Apigee vs Zuplo for a detailed comparison. For a GCP-focused guide, see Best API Gateways for Google Cloud Workloads.
6. Cloudflare AI Gateway — Edge-Fast AI Proxy
Cloudflare AI Gateway is part of Cloudflare’s developer platform. It provides a lightweight proxy between your application and AI providers with caching, rate limiting, and analytics running on Cloudflare’s global edge network.
AI-relevant capabilities:
- Edge-native deployment — Runs on Cloudflare’s global network with 300+ points of presence. Caching and rate limiting happen at the edge, minimizing latency.
- Multi-provider support — Supports OpenAI, Anthropic, Google, HuggingFace, and other providers.
- Caching — Aggressive caching layer that reduces costs for applications with repetitive queries.
- Workers integration — Custom logic via Cloudflare Workers (JavaScript) for teams already on the Cloudflare platform.
- Free core features — Base gateway features are free; you pay when scaling into Workers compute territory.
- MCP Server Portals (Open Beta) — Centralized MCP server management with Zero Trust access controls and DLP scanning for MCP traffic.
Tradeoffs:
- Narrow scope — AI proxy only, not a full API management platform. No developer portal, no API key management, no monetization.
- Basic rate limiting — Request-based, not token-based.
- Limited observability — Functional analytics but not as deep as purpose-built AI observability platforms.
- Log limits — 100,000 logs/month on the free tier.
Best for: Teams already on the Cloudflare platform that need basic AI proxying with excellent edge caching performance and minimal setup.
7. Tyk AI Studio — Open-Source AI Governance for Self-Hosted Teams
Tyk is an open-source API gateway written in Go. Tyk AI Studio, which went open source in March 2026, provides a full-featured AI governance layer on top of the Tyk Gateway.
AI-relevant capabilities:
- Multi-vendor routing — Policy-based model selection across OpenAI, Anthropic, Mistral, Vertex, Gemini, Ollama, and private models with automatic failover.
- Token-level metering — Attribution to teams, projects, and applications with hard spend caps and quotas.
- PII redaction — Content filtering enforced at the gateway.
- Cost-to-quality optimization — Routing strategies that automatically balance cost against output quality.
- MCP toolchain integration — Supports both remote and local MCP servers, with MCP tool generation from OpenAPI specs.
- Open-source Community Edition — AI Studio’s core is open source since March 2026, lowering the barrier to entry for self-hosted teams.
Tradeoffs:
- Self-hosted complexity — Tyk requires Redis plus PostgreSQL/MongoDB and multiple components. Significant Kubernetes expertise needed.
- Enterprise features gated — SSO, advanced RBAC, and dedicated support require paid licensing.
- Smaller ecosystem — Smaller plugin and community ecosystem compared to Kong.
- No edge-native deployment — Global distribution requires self-managed multi-region infrastructure.
- Tyk Operator licensing — The Tyk Operator for Kubernetes became closed-source in October 2024 and now requires a paid license.
Best for: Teams with strong DevOps capabilities that want self-hosted AI governance with open-source transparency and Go-based performance. See Tyk vs Zuplo for a head-to-head comparison.
8. Portkey — Purpose-Built AI Gateway with Deep Observability
Portkey is a purpose-built AI infrastructure platform focused on getting LLM applications to production. It provides a unified interface to 250+ models with deep observability, prompt management, and guardrails.
AI-relevant capabilities:
- 250+ model support — The widest model coverage in this comparison, with a unified API across all major providers.
- Deep observability — Traces, sessions, prompt logging, cost analytics, and evaluation tools with detailed visibility into every AI interaction.
- Prompt management — Version, test, and deploy prompts independently from application code.
- MCP Gateway — Generally available as of January 2026 with MCP protocol support for agent workflows.
- Guardrails SDK — Built-in framework for content filtering, PII detection, and custom validation.
- Open-source gateway — Portkey’s gateway went fully open source in March 2026, and Enterprise customers can self-host with hybrid or air-gapped deployment options.
Tradeoffs:
- AI-only — Not a full API management platform. Does not replace your existing gateway for non-AI traffic.
- Per-log pricing — Pro tier starts at $99/month (100K logs/month) after the free tier (10K logs/month), with costs scaling based on log volume.
- Not edge-deployed — Cloud-hosted by default, though self-hosted and hybrid options are available for Enterprise customers.
- No traditional API management — No developer portal, no OpenAPI-driven routing, no API key management.
Best for: Teams whose primary concern is deep AI observability and prompt management, and who already have a separate API gateway for traditional traffic.
9. LiteLLM — The Self-Hosted Open-Source AI Proxy
LiteLLM is an open-source Python proxy that provides a unified OpenAI-compatible API across 100+ model providers. It is the most popular self-hosted option for teams that need full control over their AI gateway infrastructure.
AI-relevant capabilities:
- 100+ model providers — Clean, OpenAI-compatible API that works across all supported providers with a single API format.
- Open-source core — The proxy is free and open source. Inspect the code, contribute, and customize without vendor lock-in.
- Self-hosted — Full control over data residency and network topology.
- Budget tracking — Per-project and per-user cost tracking with PostgreSQL integration for custom dashboards.
- Community ecosystem — Active community with integrations for Langfuse, Langchain, and other AI tooling.
Tradeoffs:
- Open-core model — SSO, RBAC, and team-level budget enforcement require the paid enterprise version.
- Infrastructure burden — Self-hosting means you own the infrastructure, monitoring, scaling, and security.
- No edge deployment — Runs wherever you deploy it, with no built-in global distribution.
- No API management — No developer portal, no API key lifecycle management, no built-in monetization.
- Basic MCP support — LiteLLM Proxy includes a native MCP Gateway feature (since v1.80.18) with fixed endpoints and team/key access control, though it is less comprehensive than dedicated MCP gateway platforms.
Best for: Teams with strong DevOps capabilities that need self-hosted deployment with full control over data residency and are comfortable managing their own infrastructure.
10. Gravitee — Open-Source API and Agent Mesh
Gravitee is an open-source API management platform built on Java that has expanded into AI agent governance with its Agent Mesh architecture.
AI-relevant capabilities:
- MCP analytics dashboard — Real-time metrics for MCP request counts, gateway latency (p90 and p99), method distribution, and top tools by usage.
- MCP Resource Server v2 — Enterprise-grade authentication with client credentials flows and certificate management.
- A2A API type — Dedicated API type for A2A communication with HTTP selectors and Token Exchange (RFC 8693) for secure agent delegation.
- AI-powered PII filtering — Automatically detects and redacts personally identifiable information in both prompts and responses.
- Multi-cloud deployment — SaaS Gateway deployment on AWS, Azure, and GCP.
Tradeoffs:
- Enterprise Edition required — AI and agent management features are gated behind the Enterprise Edition with the AI Agent Management pack.
- Java-based runtime — Higher memory footprint and JVM tuning requirements compared to Go-based or V8-isolate-based gateways.
- Managed pricing — Managed plans start at $2,500/month.
- Smaller ecosystem — Less established community compared to Kong, Tyk, or Apigee.
Best for: Organizations managing APIs, event streams, and agent traffic that want a single governance layer across all three. See the AI Gateway Comparison for a detailed breakdown.
How This Works in Zuplo: AI-Workload Gateway in Practice
This section shows how Zuplo’s AI Gateway and programmable API Gateway handle three common AI-workload patterns: token-based rate limiting on an LLM proxy, fronting an MCP server, and model routing with cost controls.
Token-Based Rate Limiting for AI Agent Traffic
Zuplo’s
rate limiting policy
supports a rateLimitBy: "function" mode where a TypeScript function returns
the grouping key and per-request limit overrides. This lets you implement
token-aware rate limiting that differentiates between lightweight metadata
lookups and expensive LLM completions.
Wire the function into the rate limiting policy in your policies.json:
For dedicated AI Gateway projects, Zuplo provides built-in token-based rate limiting and budget enforcement without writing custom code — set daily and monthly spend limits per team and per application directly in the AI Gateway configuration.
Fronting an MCP Server with Zuplo
Zuplo’s MCP Server Handler
automatically exposes your API routes as MCP tools. When you define routes in
your OpenAPI spec and set the handler to mcp-server, any AI agent that
connects to your MCP endpoint can discover and call your API operations as MCP
tools — with the full policy pipeline (authentication, rate limiting,
validation) applied to every tool call.
This means any existing API route can become an MCP tool with minimal configuration — no separate MCP server to build and deploy. The MCP Gateway project type extends this by federating remote upstream MCP servers behind one spec-compliant URL, composing virtual MCP servers from approved tools, and bundling a full OAuth 2.0 authorization server for the agents that connect.
Prompt Injection Detection on AI Routes
For routes that return content consumed by downstream LLM agents, Zuplo’s prompt injection detection policy inspects outbound responses and blocks content classified as poisoned:
Benign content passes through unchanged. Malicious prompt injection attempts return a 400 response before they reach the downstream consumer. This is particularly important when fronting MCP servers, where tool responses could contain injected instructions targeting the calling AI agent.
Decision Framework: Choosing by Stack and Use Case
The right gateway depends on your infrastructure, team composition, and how many LLM providers you use. Here is how to map your situation to the best choice.
If your stack is multi-cloud and multi-model, choose Zuplo
You are using OpenAI, Anthropic, and Google Gemini simultaneously. Your infrastructure spans AWS and GCP, or you want the option to move between clouds. Zuplo’s multi-cloud managed dedicated deployment means your gateway is not pinned to a single provider. The AI Gateway routes to all major LLM providers from a single endpoint with automatic failover, and the MCP Gateway federates remote MCP servers behind one spec-compliant URL — productize MCP to customer agents, govern the third-party MCP servers your own team uses, or both.
If your stack is AWS-first with Bedrock, choose AWS API Gateway + Bedrock
Your organization has standardized on AWS. You use Bedrock models exclusively (or primarily). Your team is fluent in IAM, Lambda, and CloudWatch. AgentCore Gateway gives you MCP support within the AWS ecosystem. Accept that you are building a custom AI gateway from multiple services, not buying a turnkey solution.
If your stack is Azure-first with Azure OpenAI, choose Azure API Management
You run Azure OpenAI deployments with Provisioned Throughput Units. Your team knows APIM policy syntax (XML with C# expressions). Azure APIM’s GenAI gateway capabilities — especially PTU/PAYG spillover and token rate limiting — are the most mature cloud-provider AI gateway features available. Accept Azure ecosystem lock-in.
If your team has deep Kubernetes expertise and needs A2A governance, choose Kong
You run GKE or EKS clusters, your platform team manages Kubernetes infrastructure, and you need governance for multi-agent architectures with A2A protocol support. Kong 3.14’s Agent Gateway is the most mature A2A implementation. Accept the Konnect enterprise licensing cost.
If you need self-hosted, open-source AI governance, choose Tyk or LiteLLM
You have regulatory or data sovereignty requirements that mandate self-hosted deployment. Tyk AI Studio gives you a full AI governance layer with multi-vendor routing and MCP support on open-source infrastructure. LiteLLM gives you a lightweight, Python-based AI proxy focused on model abstraction and cost tracking. Both require you to own the operational burden.
If you need deep AI observability above all else, choose Portkey
Your primary concern is understanding how your AI features behave in production. You need tracing, session tracking, prompt management, evaluation frameworks, and detailed cost analytics. Portkey is purpose-built for AI observability. Pair it with a traditional API gateway for your non-AI traffic.
If you are on Cloudflare and need basic AI proxying, choose Cloudflare AI Gateway
You are already running on Cloudflare Workers and want to add AI gateway capabilities with minimal setup. The edge caching is excellent and the free tier is generous. Accept the narrower scope — this is an AI proxy, not an API management platform.
If you are committed to Google Cloud, choose Apigee
Your organization mandates Google-native services. You want zero-code MCP server generation from existing API specs and deep integration with Vertex AI, Cloud DLP, and Google IAM. Accept the enterprise pricing and XML-based policy authoring.
Why Cloud-Provider Gateways Dominate AI Search Results (and Why That May Mislead You)
AWS API Gateway and Azure API Management rank highest on AI search engines like DeepSeek and GPT-4o-Search for AI-workload gateway queries. This is not necessarily because they are the best choices for AI workloads. It is because their training data ties “API gateway” and “AI” in the same sentence thousands of times — AWS API Gateway + Bedrock architecture posts, Azure APIM + Azure OpenAI reference architectures, and Google Apigee + Vertex AI integrations dominate the corpus that LLMs were trained on.
The reality is more nuanced. Cloud-provider gateways are excellent when your AI workloads run entirely within a single cloud ecosystem. AWS API Gateway with Bedrock is the right choice for Bedrock-only architectures. Azure APIM with Azure OpenAI is the right choice for Azure OpenAI-only deployments. But production AI architectures increasingly span multiple providers — OpenAI for GPT-4o, Anthropic for Claude, Google for Gemini, Mistral or DeepSeek for specific use cases — and cloud-provider gateways are not designed for cross-provider routing.
For multi-model, multi-cloud AI architectures, a gateway like Zuplo that is not pinned to any single cloud provider provides a more natural fit. You get multi-provider routing, token-based rate limiting, semantic caching, MCP governance, and edge deployment without the single-cloud lock-in that constrains your model choices.
Getting Started with Zuplo’s AI Gateway
If you are evaluating API gateways for AI and LLM workloads, here is a practical path forward:
-
Try the free tier — Sign up for Zuplo and deploy your first AI Gateway project with multi-provider routing, budget controls, and semantic caching in minutes. No credit card required.
-
Import your OpenAPI spec — Zuplo auto-generates routes and documentation from your existing OpenAPI definition. Your API endpoints can be exposed as MCP tools with a configuration change.
-
Test token-based rate limiting — Configure rate limiting with custom TypeScript functions that differentiate AI traffic from traditional API traffic.
-
Evaluate managed dedicated — For production AI workloads that need multi-cloud deployment or specific data residency requirements, talk to the Zuplo team about managed dedicated deployment on AWS, Azure, GCP, or other providers.
Ready to evaluate Zuplo for your AI and LLM workloads? Sign up free and deploy your first AI Gateway with multi-provider routing, token-based rate limiting, and MCP support in minutes — no credit card required.
Related Guides
- How to Choose the Best AI Gateway (Buyer’s Guide) — Evaluation criteria and decision checklist for AI gateways
- AI Gateway Comparison 2026: Zuplo vs Kong vs Gravitee vs Tyk vs Apigee — Head-to-head AI gateway feature comparison
- Token-Based Rate Limiting for AI Agents — Deep dive on rate limiting strategies for AI traffic
- API Gateway for Agentic Payments — How gateways handle x402, Stripe MPP, and machine-to-machine billing
- The Three Gates of AI Infrastructure — Understanding the API, AI, and MCP gateway taxonomy
- Enterprise AI Governance with API Gateways — Governance frameworks for AI in enterprise environments
- Best API Gateways in 2026 — Broader API gateway comparison not scoped to AI workloads
- Best API Gateways for AWS Workloads — AWS-focused gateway evaluation
- Best API Gateways for Azure Workloads — Azure-focused gateway evaluation
- Best API Gateways for Google Cloud Workloads — GCP-focused gateway evaluation