Zuplo
AI

The Three Gates of AI Infrastructure: API, AI, and MCP Gateways Compared

Nate TottenNate Totten
May 11, 2026
11 min read

Learn the three-gate model of AI infrastructure — API gateways, AI gateways, and MCP gateways — and when to unify or separate them.

A new taxonomy has taken hold in the AI infrastructure conversation. Over the past several months, blog posts, conference talks, and vendor launches have converged on a model that splits the gateway category into three distinct layers: the API gateway, the AI gateway, and the MCP gateway. Each handles a different traffic shape, protocol, and set of risks.

The framing is useful. It gives platform teams a shared vocabulary for the different kinds of traffic now flowing through their infrastructure. But the framing is also being weaponized by point-solution vendors who want you to believe that three traffic types require three separate products, three policy languages, and three billing relationships.

This article maps out what each gate actually does, where they overlap, where they genuinely diverge, and how to decide whether your organization should unify them or keep them separate.

What Each Gate Actually Does

Gate 1: The API Gateway

The API gateway is the most mature of the three. It sits between clients and your backend services, managing traditional HTTP traffic — REST, GraphQL, gRPC, and webhooks.

Traffic shape: Request-response over HTTP. Stateless. The client sends a request, the gateway applies policies, the backend returns a response.

Primary responsibilities:

  • Routing — Match incoming requests to backend services based on path, method, headers, and host.
  • Authentication and authorization — Validate API keys, JWTs, OAuth tokens, and mutual TLS certificates before requests reach your services.
  • Rate limiting — Enforce request-per-second or request-per-minute caps per consumer, per endpoint, or globally.
  • Request validation — Verify that request bodies, query parameters, and headers conform to your OpenAPI schema.
  • Observability — Log every request, track latency percentiles, and export metrics to your monitoring stack.

Primary risks: Unauthorized access, DDoS, data exfiltration through over-fetching, and schema drift between what your API accepts and what your docs promise.

Key metric: Requests per second, error rate, P95 latency.

The API gateway is well-understood technology. The patterns are established, the tooling is mature, and most engineering teams have operated one for years.

Gate 2: The AI Gateway

The AI gateway emerged as organizations moved from experimenting with LLMs to running them in production. It sits between your application and one or more LLM providers — OpenAI, Anthropic, Google, Mistral — managing the traffic that flows to and from language models.

Traffic shape: Request-response over HTTP, but with fundamentally different economics. A single LLM request can consume thousands of tokens and cost dollars instead of fractions of a cent. Responses stream incrementally via server-sent events. Latency is measured in seconds, not milliseconds.

Primary responsibilities:

  • Multi-provider routing — Send requests to different LLM providers based on cost, latency, capability, or availability, with automatic failover when a provider goes down.
  • Token-based rate limiting — Cap consumption by tokens, not just requests. A single request might consume 50 tokens or 50,000 — request counting alone cannot control cost.
  • Cost budgets — Set daily and monthly spending limits per team, per application, or per organization, with enforcement that blocks or warns when thresholds are hit.
  • Semantic caching — Detect semantically similar prompts and return cached responses, reducing both cost and latency without requiring exact input matches.
  • Security guardrails — Detect prompt injection attempts, mask secrets in responses, and filter PII before it reaches downstream consumers.

Primary risks: Runaway costs from uncontrolled token consumption, prompt injection attacks, secret leakage in model responses, and vendor lock-in to a single LLM provider.

Key metric: Tokens per second, cost per request, cache hit rate, time-to-first-token.

The AI gateway is where the economics of the request change the engineering. You cannot manage LLM traffic with the same tools you use for REST APIs, because the cost structure and failure modes are fundamentally different.

Gate 3: The MCP Gateway

The MCP gateway is the newest of the three. The Model Context Protocol standardizes how AI agents discover and invoke tools — reading a database, filing a ticket, querying a CRM. The MCP gateway governs that agent-to-tool interaction.

Traffic shape: JSON-RPC over HTTP with persistent sessions. Unlike stateless REST, MCP sessions maintain state across multiple tool calls. MCP servers push notifications to clients via server-sent events, making the communication bidirectional. Everything meaningful — the method type, the specific tool being called, the parameters — lives in the JSON-RPC body, not in HTTP path or headers.

Primary responsibilities:

  • Tool discovery and governance — Maintain a catalog of approved MCP servers. Control which teams and agents can access which tools. Filter tools/list responses by role.
  • Credential brokering — Employees authenticate via SSO. The gateway brokers per-server credentials (service accounts or per-user OAuth tokens) so nobody pastes raw API tokens into their AI client configuration.
  • Audit logging — Record every tool call: who called it, with what inputs, through which AI client, and what the result was. Stream audit data to your SIEM.
  • Access control — Enforce least-privilege at the tool level. An agent that can connect to a server should not automatically have access to every tool that server exposes.
  • Virtual server composition — Expose a curated subset of tools from one or more approved servers as a single virtual MCP server (for example, a read-only financial tools view for your finance team).

Primary risks: Shadow IT from unsanctioned MCP servers, credential sprawl (GitGuardian found over 24,000 secrets exposed in MCP configuration files in 2025), tool poisoning (malicious instructions embedded in tool metadata), and privilege escalation through unrestricted tool access.

Key metric: Tool call success rate, tool calls per session, credential rotation frequency, percentage of traffic through governed servers.

The MCP gateway is where organizational trust boundaries meet autonomous agent behavior. The question it answers is not “can the agent reach this service?” but “should the agent be allowed to do this?”

The Overlap Matrix: Shared Controls vs. Genuine Divergence

The three gates share more infrastructure than the point-solution vendors would like you to believe. Here is where the controls overlap and where they genuinely diverge.

Where All Three Gates Share Controls

Authentication. Every gate needs to verify identity before processing a request. API keys, JWTs, OAuth tokens, and mutual TLS work across REST, LLM, and MCP traffic. The identity model — who is this consumer, and what are they allowed to do — is the same regardless of traffic type.

Rate limiting. All three gates need to protect backends from overuse. The mechanism differs (requests per second for REST, tokens per minute for LLM, calls per session for MCP), but the enforcement pattern is the same: identify the consumer, check the counter, allow or deny.

Observability. Every gate needs logging, metrics, and tracing. Request volume, latency, error rates, and per-consumer attribution matter equally for REST endpoints, LLM requests, and MCP tool calls. Exporting to the same observability backend (Datadog, Splunk, OpenTelemetry) simplifies correlation across traffic types.

Policy enforcement. A policy pipeline that intercepts requests, applies transformations, and makes allow/deny decisions is useful for all three gates. The policies differ (schema validation for REST, prompt injection detection for LLM, tool-level RBAC for MCP), but the execution model — inbound policy chain, handler, outbound policy chain — is the same.

Where the Gates Genuinely Diverge

Token accounting is specific to the AI gateway. REST APIs and MCP tool calls do not consume tokens. Only LLM traffic requires tracking input tokens, output tokens, and cost-per-request against a budget hierarchy. This is not a feature you can bolt onto a request counter.

Tool discovery and schema semantics are specific to the MCP gateway. An API gateway matches requests by URL pattern. An MCP gateway needs to understand tool schemas, filter tool lists by permission, and in some implementations support semantic search across tool definitions. The routing model is fundamentally different.

Session statefulness is specific to MCP. REST APIs are stateless by design. LLM requests are typically stateless at the transport layer (conversation state is managed by the application, not the gateway). MCP sessions persist across multiple tool calls and require session tracking at the gateway level.

Bidirectional communication is specific to MCP. API gateways and AI gateways handle request-response traffic. MCP servers push notifications to clients via SSE — resource updates, schema changes, and progress events. The gateway must handle traffic flowing in both directions.

Cost modeling differs across all three. REST traffic is typically priced by request count or data transfer. LLM traffic is priced by token. MCP traffic may be priced by tool call, by downstream API call, or not at all. A unified billing model must accommodate all three.

The Case for Unification

When you operate three separate gateway products, you operate three runtimes, three configuration languages, three deployment pipelines, and three audit trails. For many organizations, this fragmentation creates more risk than it mitigates.

Single source of truth. An OpenAPI specification can drive REST routing, request validation, developer portal generation, and MCP tool generation from the same file. When you change an API endpoint, the validation rules, the documentation, and the MCP tool definition all update automatically. Three separate products mean three separate configurations that must stay in sync manually.

Consistent auth model. When the same API key or JWT works across your REST APIs, your LLM proxy, and your MCP tools, you eliminate the credential sprawl that comes from managing three separate auth systems. A consumer authenticated at the API gateway does not need to re-authenticate at the AI gateway or the MCP gateway.

Cross-traffic correlation. When a single request from an AI agent triggers an MCP tool call, which calls a REST API, which queries an LLM — and all of that flows through the same policy pipeline — you get an end-to-end trace without stitching together logs from three different systems.

Fewer operational surfaces. One deployment pipeline. One policy language. One team that knows how to operate the infrastructure. The operational cost of running three separate gateways is not three times the cost of one — it is higher, because of the coordination overhead between them.

How this works in practice. Zuplo runs all three gate types on a single edge runtime. The same TypeScript policy pipeline that handles REST authentication and rate limiting also handles AI gateway token budgets and MCP server tool governance. A single OpenAPI specification drives REST routing, request validation, and MCP tool generation. The MCP Server Handler re-invokes the full policy pipeline for each tool call, so authentication, rate limiting, and observability apply to MCP traffic exactly as they do to REST traffic.

The Case for Separation

Unification is not always the right answer. There are legitimate scenarios where dedicated, specialized gateways make more sense.

Vendor-specific MCP tooling. If your MCP governance requirements include features like sandboxed execution environments, quarantine workflows for unapproved tools, or advanced semantic search across tool definitions, a purpose-built MCP gateway may offer deeper capabilities than a general-purpose gateway with MCP support.

Legacy LLM proxying. If your AI gateway is deeply integrated with a specific IDE or development environment, and switching would disrupt established workflows, maintaining a separate AI gateway can be the pragmatic choice.

Organizational boundaries. If your API gateway is managed by a platform team, your AI gateway by an ML engineering team, and your MCP governance by a security team — and those teams have different deployment cadences, compliance requirements, and operational preferences — separate products can reduce coordination overhead.

Protocol immaturity. MCP is evolving rapidly. A purpose-built MCP gateway can iterate on protocol-level features (session management, bidirectional SSE, tool schema negotiation) faster than a general-purpose gateway that must maintain stability across all three traffic types.

Compliance isolation. Some regulatory environments require that specific traffic types flow through audited, certified infrastructure. If your LLM traffic must be processed in a specific region by a certified provider, a separate AI gateway dedicated to that compliance boundary can be simpler than configuring region-specific routing within a unified platform.

Evaluation Checklist: Eight Questions for Scoping Any of the Three Gates

Whether you are evaluating a unified platform or three separate products, these questions will help you make the right architectural decision.

1. How many policy languages will your team need to learn? A unified platform means one policy model across all three traffic types. Three products mean three configuration systems to master and keep in sync.

2. Can you enforce authentication consistently across all three gates? Check whether the same API key, JWT, or OAuth token works across REST, LLM, and MCP traffic — or whether each gate requires its own auth configuration.

3. How does the platform handle token-based rate limiting? Request counting is not enough for LLM traffic. The AI gateway must track input and output tokens per request and enforce limits based on token consumption, not request count.

4. Can you generate MCP tools from your existing API definitions? If your APIs are defined with OpenAPI, check whether the platform can automatically expose those endpoints as MCP tools without building a separate MCP server.

5. Where does the gateway run? Edge deployment reduces latency for all three traffic types. Check whether the platform deploys globally or requires you to manage multi-region infrastructure yourself.

6. How does the platform handle MCP session state? MCP sessions persist across multiple tool calls. The gateway needs to track session state without breaking the stateless scaling model you rely on for REST traffic.

7. Can you get a single audit trail across all three traffic types? When an AI agent triggers an MCP tool call that invokes a REST API, can you trace that entire chain in one place? Or do you need to correlate logs across three separate systems?

8. What is the total cost of ownership — not just licensing? Three separate products mean three vendor relationships, three upgrade cycles, and three sets of operational knowledge. Factor in the coordination overhead, not just the line items on each invoice.

If you want to see how a single platform handles all three gate types, start a free Zuplo account and deploy API, AI, and MCP gateway policies on one edge runtime.

The Honest Map

The three-gate model is a useful mental model, not a purchasing guide. It describes three distinct traffic patterns that your infrastructure must handle. It does not prescribe three distinct products.

Some organizations will benefit from a unified platform where the same policy primitives — authentication, rate limiting, validation, observability — apply across API, AI, and MCP traffic on a single runtime. Others will benefit from specialized products that go deeper on a specific traffic type’s unique requirements.

The wrong answer is to ignore the question entirely. If you are running AI agents in production, you are already operating all three gates — whether you have named them or not. The traffic is flowing. The only question is whether you are governing it intentionally or discovering the gaps after an incident.

Zuplo handles all three gate types on one programmable edge runtime. The same TypeScript policies that secure your REST APIs also govern your AI gateway traffic and your MCP server tool calls. If you are evaluating how to unify your gateway infrastructure, start with a free account and see the three gates running on one platform.