Anthropic Just Made the Case for MCP Gateways

Anthropic just published the best argument for an MCP gateway I have read this year. It never uses the word gateway, and endorses no vendor, us included.

How we contain Claude across products is a candid writeup of how Anthropic keeps claude.ai, Claude Code, and Cowork from doing real damage as the models get more capable. It circles one question: how do you cap the blast radius?

Their answer splits in two. The host plane, files and processes on the machine, stays the job of sandboxes and VMs. The network, tool, and MCP plane, everything the agent reaches over the wire, needs a different control. If you run API infrastructure, that second half should look familiar.

Use this approach if you're:

You run or expose MCP servers to AI agents
You hand Claude Code, Cursor, or ChatGPT access to internal tools and APIs
You want to contain what an agent can reach, not just watch what it does

Two ways to cap an agent’s blast radius

Anthropic frames risk as likelihood times damage. Model training drives down likelihood; nothing drives down damage except containment. That leaves two levers: supervise behavior with a human in the loop, or contain capability, “rather than supervising what the agent does, we supervise what it’s able to do,” with sandboxes, virtual machines, and egress controls. Anthropic chose the second.

Approval fatigue is why supervision can’t stand alone

The supervision lever decays. “Our telemetry showed users approved roughly 93% of permission prompts,” Anthropic reports, and more prompts mean less attention each. They call it approval fatigue, and it “appeared within weeks.” Trail of Bits described the same dynamic, human review reduced to a rubber stamp.

Their fix was not more prompts but a deterministic default: in the Claude Code sandbox, reads are allowed, writes are allowed inside the workspace, and network is denied. That single boundary cut permission prompts by 84%, and it holds whether or not anyone is watching.

The model layer cannot be that boundary on its own either. On Gray Swan’s agent red-teaming benchmark, Claude Opus 4.7 holds prompt-injection success to about 0.1% on a single attempt but 5 to 6% after a hundred adaptive attempts, and Claude Code’s auto mode still lets roughly 17% of overeager actions through. As Anthropic concludes, “protection in the model layer will never be 100% effective, which is why it can’t stand alone.” Our Q1 2026 API and agent security scorecard tracks how those miss rates play out in practice.

Both of Anthropic’s worst incidents were egress failures

The two most instructive failures are the same shape: data leaving through a permitted path.

In the first, a red team phished an Anthropic employee into launching Claude Code with a prompt telling Claude to read ~/.aws/credentials and POST them to an external endpoint. Across 25 retries it exfiltrated 24 times. The instructions arrived through the user, so the model layer had nothing anomalous to catch. Anthropic’s takeaway: “the only defense that holds is the environment,” egress controls that block the POST regardless of intent.

In the second, Cowork’s egress allowlist passed traffic to api.anthropic.com. A malicious file in the workspace carried hidden instructions and an attacker’s API key. With arbitrary outbound blocked, the instructions got Claude to call the Files API with that key; the egress proxy saw an allowed domain and waved it through to the attacker’s account. “The sandbox worked perfectly, and yet the data was exfiltrated.”

Both are the egress leg of Simon Willison’s lethal trifecta: private data, untrusted content, and a way to talk to the outside world. With all three present, the only reliable lever is cutting the third. As Anthropic puts it, “the deterministic boundary is what gets hit when everything probabilistic misses.”

An allowlist is a capability grant, not a filter

This is the key point. After the Cowork incident Anthropic reframed the problem: “the allowlist is not a destination filter, it is a capability grant.” Allowing api.anthropic.com allowed every function reachable there, including file uploads to arbitrary accounts.

That reframing changes what you build. “This agent may reach this server” is a blunt grant; the control you want is per request and identity-aware: this agent, acting for this user, may call this specific tool, right now.

Anthropic asks the same question from the other side: should an agent have its own identity or inherit the user’s? They say “the answer may be a blend of the two.” A boundary that knows who is calling, and on whose behalf, is where you enforce that blend.

Common mistake:

Treat a domain allowlist as a destination filter and you have quietly granted every capability reachable at that domain. Scope by capability and identity, not by hostname.

For MCP the boundary has two concrete jobs, both deterministic. The first curates which tools a server exposes: a gateway can publish a read-only view of an upstream, say the Stripe server with the destructive tools filtered out. The second binds every token to the server it was minted for. With RFC 8707 resource indicators, a token issued for one virtual server is rejected at another, so a compromised server cannot replay a user’s token elsewhere. That closes the confused-deputy attack the MCP spec calls out.

Don’t hand-roll the proxy

The most uncomfortable lesson is for anyone tempted to build their own boundary. The battle-tested isolation primitives held: hypervisors, syscall filters, and container runtimes like gVisor “survived more adversarial attention than anything you’ll build.” The custom glue around them is what failed. Anthropic says it twice, “the weakest layer is the one you built yourself” and “be wary of custom components.”

For the network and MCP plane, the custom glue is the OAuth server, token validation, resource binding, and credential brokering. Getting MCP authorization right means stacking overlapping RFCs, OAuth 2.1, PKCE, dynamic client registration, protected-resource metadata, resource indicators, on top of a spec rule that is easy to violate: an MCP server must not accept a token that was not issued for it. Every re-implementation is a fresh chance to get it wrong. Implement it once, not once per server.

One honest caveat. Anthropic’s fix for the approved-domain exfil was a proxy inside the VM, “because only the VM knows provenance.” An external gateway lacks that view, so it closes the gap from the other end: for any call routing through it, the gateway brokers the credential, so what reaches the upstream is its own scoped token, not a key a poisoned file slipped in.

Where a gateway ends and the sandbox begins

Be explicit about scope so none of this reads as overreach. A gateway owns the network, tool, MCP, and API plane: who is calling, what they may invoke, which token is valid, what credential reaches the upstream, and a per-call audit of it. It does not own the host plane. Process sandboxing, VM isolation, filesystem boundaries, keeping ~/.aws out of reach, that is the sandbox’s job. Anthropic built gVisor containers and local VMs because they ship agents to millions of people. You probably will not, nor grant an agent that much local access.

The two planes of agent containment: a sandbox or VM contains the host plane (filesystem, processes, local credentials), while an MCP gateway contains the network, tool, and MCP plane (OAuth and audience-bound tokens, tool curation, no token passthrough, per-call audit) between the agent and upstream MCP servers and APIs.

A second limit is worth naming: a boundary only contains traffic that routes through it. A gateway cannot stop a developer’s editor from connecting straight to an MCP server that bypasses it, so making the gateway the only path is its own enforcement decision, through network policy or blocking direct MCP egress. Ask that first, not last.

Every team can adopt the slice where Anthropic’s hardest incidents happened: the boundary in front of tools, APIs, and MCP servers. You do not need to be Anthropic to need it. You need it precisely because their report shows what gets through without it.

What Zuplo puts at the MCP boundary

We built this boundary for MCP, and we use it ourselves: our own team reaches the third-party MCP servers it depends on, Linear, Stripe, Notion, ClickHouse, through this gateway, the read-only-Stripe view above included. The Zuplo MCP Gateway, in public beta today, maps onto Anthropic’s argument point for point:

One OAuth-protected URL in front of every MCP server your agents touch, yours and third-party, instead of a long-lived token pasted into each editor.
Tool curation per route. Publish a read-only or hand-picked subset of an upstream’s tools, the capability grant made explicit rather than inherited.
A full OAuth authorization server to spec, with dynamic client registration, PKCE, and protected-resource metadata, plus RFC 8707 tokens bound to one virtual server so they cannot be replayed at another.
No token passthrough. The gateway holds the upstream credentials and attaches them server-side, so the agent never sees them, and encrypts them at rest.
Programmable policies that inspect a tool response before it re-enters the model’s context, so a poisoned payload can be redacted or blocked on the way back.

That last one mirrors Anthropic: in Claude Code and Cowork, “tool calls route through proxies that enforce network and file policy and can inspect return values before they enter the model’s context,” and that classifier “can be a small, fast model.” It needs the same honesty, though: the deterministic parts, authentication, token scope, resource binding, and tool curation, are the guarantee, while content inspection for prompt injection is defense in depth, and a detection rate is not a guarantee.

That is also why injection in MCP flows backwards: the poisoned payload arrives in a tool response, so the boundary scans what comes back, not just what goes in.

The Zuplo MCP Gateway virtual-server wizard at the Tools step: Curate is selected over Passthrough, and each of the upstream’s tools, prompts, and resources sits behind a checkbox, so the operator exposes only the safe subset and switches the destructive tools off.

MCP Gateway Quickstart

Build a virtual MCP server in the browser: pick an upstream, wire up OAuth, curate the tools, and point an agent at it.

Anthropic’s report is the rare security writeup that argues for an entire category without selling anything. Take it at face value: contain at a deterministic boundary, treat every allowlist as a capability grant, and don’t hand-roll the proxy.

For the MCP plane that boundary already exists off the shelf, so your engineering can go to the host-plane problems that are genuinely yours.

The Zuplo MCP Gateway is in public beta and available now on every plan, including the free one. Spin up a free project and point your first agent at a virtual MCP server today.