You shipped an MCP server. Or you’re federating other people’s. Either way, AI agents can discover tools, call them, and get structured responses. The integration works. Now what?
The question every team hits next is the same one they’ve answered for REST APIs a hundred times: how do I know what’s happening? But for Model Context Protocol (MCP) traffic, the familiar metrics — requests per second, p99 latency, 5xx error rate — don’t tell the full story. MCP traffic is tool-shaped, agent-shaped, and prompt-shaped in ways that REST traffic simply isn’t.
This guide defines the metric vocabulary for MCP observability at the gateway layer. You’ll learn which signals matter, what they look like in practice on Zuplo’s MCP Gateway dashboard, and how to capture the same signals on your own MCP server route if you’re rolling instrumentation yourself.
- Why MCP observability is different from REST observability
- What MCP analytics look like in practice
- The five signals to instrument
- Cost signals: token consumption and per-tool spend
- Two paths to MCP analytics
- Alerting and SLOs for agent-driven traffic
- Worked example: an MCP analytics dashboard
- Getting started: identity first, then signals
Why MCP observability is different from REST observability
REST API observability is endpoint-shaped. You monitor GET /users/{id} and
track its latency, error rate, and throughput. The URL path is the unit of
analysis.
MCP observability is tool-shaped. A single MCP server route — typically
POST /mcp — handles every interaction. All tool calls, prompt requests, and
resource reads flow through that one endpoint. If you only track HTTP-level
metrics, you see a single line on your dashboard: total requests to /mcp. That
tells you almost nothing about what’s actually happening.
The meaningful dimensions live inside the JSON-RPC payloads:
- Capability name — Which tool, prompt, or resource was called?
search_productsanddelete_accounthave very different risk profiles and performance characteristics. - Method type — Was it
tools/call,tools/list,prompts/list,resources/list,initialize,ping? Each method type has different latency expectations and failure semantics. - Agent identity — Which AI agent or MCP client initiated the request? Claude Code, a Cursor plugin, MCP Inspector, and a production agentic workflow have different traffic patterns and SLA expectations.
- Prompt parameters — Which prompt template was invoked, and with what argument shapes? Prompt distribution reveals how agents are actually using your server.
Traditional API monitoring tools — Datadog, Grafana, New Relic — can absolutely handle MCP data. But they need the gateway layer to parse MCP payloads and emit these tool-level dimensions as structured metadata. Without that enrichment, your dashboards show HTTP-level metrics that miss the point.
What MCP analytics look like in practice
Before drilling into individual signals, it helps to see what a finished MCP analytics dashboard looks like. Zuplo’s MCP Gateway ships one in the Account Analytics section, and its structure is broadly applicable to any MCP analytics setup.
A real MCP dashboard answers four questions immediately:
- Are calls succeeding? Success rate, total events, total failures.
- Are calls fast? p95 latency, split into the part the gateway controls and the part the upstream controls.
- Where are failures originating? Client (bad request, missing token), auth (rejected credential), upstream (third-party service returned an error), or gateway (your own infrastructure).
- What’s actually being called? Top capabilities (tools, prompts, resources)
by call count, error rate, and p95, plus the breakdown by MCP method
(
tools/call,tools/list,initialize,prompts/list,resources/list,notifications/initialized,ping, and so on).
The three event families
Zuplo’s MCP Gateway emits typed events that fall into three families, and the same three families are a useful organizing principle even if you’re rolling your own:
- Requests — The MCP request lifecycle: a request received, validated, completed, or rejected. This is your traffic volume and success-rate signal.
- Capabilities — Capability invocations (tool, prompt, or resource): listed, invoked, completed, or failed. This is your “what is the agent actually doing” signal.
- Auth — The full upstream OAuth flow: tokens validated, rejected, refreshed, or revoked. This is the signal that catches “agents are integrated but credentials are wrong.”
A healthy MCP server typically shows Requests as the bulk of events (around three-quarters), with Capabilities and Auth making up the rest. If Auth dominates, you have a credential problem. If Capabilities is unusually low relative to Requests, agents are connecting but not actually invoking tools.
Failure origins, not just failure rates
A 4% error rate on tools/call tells you something is wrong. A 4% error rate
broken down by origin tells you who needs to fix it:
- Client — The agent sent malformed arguments, a missing token, or an invalid request. The client owns the fix.
- Auth — The auth layer rejected the credential (invalid token, expired token, revoked token, missing scope). Either the agent’s credential needs renewal, or you have a misconfigured access policy.
- Upstream — The third-party MCP server or backend service returned an error. The upstream owns the fix; you can only retry or fall back.
- Gateway — Your own gateway returned the error. You own the fix.
Documented problem codes attached to each error — missing_token,
invalid_token, expired_token, revoked_token, upstream_error — let MCP
clients recover automatically rather than retrying blindly.
Gateway vs upstream latency
The single most important latency view for federated MCP traffic is gateway p95 vs upstream p95, plotted on the same chart. The gateway is your overhead (typically single-digit milliseconds); the upstream is everything you don’t control. Splitting the two lets you set independent SLOs and immediately tell which side caused a latency regression.
If your gateway p95 is creeping up, you have an instrumentation or routing problem. If your upstream p95 spikes, it’s a third-party issue. Without the split, every latency conversation devolves into “is it us or them?”
The five signals to instrument
These are the five core signals that give you meaningful visibility into MCP traffic. They’re ordered from most immediately useful to most strategically valuable.
1. Tool call volume
What it measures: How many times each capability is invoked over a given time window.
Why it matters: Tool call volume is the MCP equivalent of
requests-per-second by endpoint. It tells you which capabilities are hot, which
are unused, and whether traffic patterns match your expectations. A sudden spike
in delete_user calls is a very different signal than a spike in
search_products.
How to capture it: Log the method and capability name from each
tools/call, prompts/get, or resources/read request. Aggregate by
capability name per time bucket (1-minute or 5-minute windows work well for most
workloads).
What to watch for:
- Capabilities with zero calls over 7+ days (candidates for deprecation)
- Unexpected volume spikes on destructive tools
- Skewed distribution where one tool accounts for 80%+ of all calls
2. Latency per tool
What it measures: Response time for each capability individually, not aggregated across the MCP endpoint.
Why it matters: Aggregate latency for POST /mcp is meaningless when your
server exposes 15 capabilities. A get_status tool that returns cached data in
20ms and a generate_report tool that calls three upstream APIs in 1200ms will
average out to a number that describes neither tool accurately.
How to capture it: Measure the time from when the gateway receives the
tools/call request to when it sends the response, broken down by capability
name. Track p50, p95, and p99 percentiles. If you’re federating, split each
measurement into gateway-side latency and upstream-side latency so you can tell
who owns a regression.
What to watch for:
- Individual capabilities degrading while the aggregate looks healthy
- Latency variance (a tool that’s usually 50ms but occasionally spikes to 5s)
- Correlation between latency spikes and specific agent consumers
- Gateway p95 staying flat while upstream p95 climbs (third-party degradation)
3. Prompt-shape distribution
What it measures: Which prompt templates agents request and how frequently, along with the argument patterns they use.
Why it matters: MCP servers can expose prompts — predefined templates that agents use to structure their interactions. Tracking which prompts are popular, which are ignored, and what argument combinations agents pass reveals how your server is actually being used versus how you designed it to be used.
How to capture it: Log the prompt name from prompts/get requests and the
prompt count from prompts/list. For deeper analysis, log the argument keys
(not values — those may contain sensitive data) to understand usage patterns.
What to watch for:
- Prompts with zero usage (are they discoverable?)
- Unexpected argument combinations (agents using prompts in ways you didn’t intend)
- Prompt popularity shifts over time (as agents learn and adapt)
4. Agent identity breakdown
What it measures: MCP traffic segmented by the AI agent, MCP client, or consumer that initiated it.
Why it matters: Without agent-level breakdown, you can’t answer basic operations questions: Which agent is responsible for this traffic spike? Is Agent A’s error rate higher than Agent B’s? Which customer’s agent is consuming the most resources?
Agent identity is the precondition for everything else — billing, abuse detection, SLA enforcement, and capacity planning all depend on knowing who is making the calls.
How to capture it: Two layers. Client identity comes from the MCP client
identifier (Claude Code, Cursor, MCP Inspector, custom agents) reported during
initialize. Tenant identity comes from authentication — authenticated MCP
traffic on Zuplo’s MCP Gateway is tied to the OAuth tenant; on your own MCP
server route, require API key authentication so the gateway resolves the key to
a consumer identity.
What to watch for:
- Single tenants dominating traffic (noisy neighbor problem)
- Tenants with abnormally high error rates (integration issues)
- New clients appearing without corresponding onboarding (unauthorized access)
- A single MCP client (e.g., MCP Inspector probes) dominating low-value traffic
5. Failure modes
What it measures: Why MCP calls fail, categorized by failure type and failure origin.
Why it matters: A 4% error rate on tools/call tells you something is
wrong. But MCP failures have distinct categories that require different
responses — and crucially, distinct origins that tell you who owns the fix:
- Validation errors (client origin) — The agent sent malformed arguments. This is a client problem, not a server problem.
- Missing or invalid token (auth origin) — The agent’s credential is
missing, expired, or revoked. Surfaced via documented reason codes like
missing_token,invalid_token,expired_token,revoked_token. - Upstream timeouts or errors (upstream origin) — A federated MCP server or
its backend dependency is slow or failing. Surfaced as
upstream_error. - Permission denials (auth origin) — The agent’s token doesn’t have access to this capability. Either a configuration issue or an attempted privilege escalation.
- Tool not found (client or gateway origin) — The agent requested a capability that doesn’t exist, possibly because it’s caching a stale tool list.
- Rate limit exceeded (gateway origin) — The agent hit its quota.
How to capture it: Log the error type, failure origin, and reason code alongside the capability name and agent identity for every failed request.
What to watch for:
- Rising validation errors after a schema change (did you break a contract?)
upstream_errorclusters on a specific virtual server (upstream dependency degrading)missing_tokenspikes from a single tenant (misconfiguration or probing)- Auth failures dominating overall errors (likely a credential rollout problem, not a code problem)
Cost signals: token consumption and per-tool spend
Beyond operational signals, MCP traffic carries cost implications that REST APIs typically don’t. When AI agents call your tools, the responses they receive influence token consumption in the LLM context window. Larger responses mean more tokens, and more tokens mean higher costs for the agent operator.
If you’re providing MCP tools to external consumers — or even internal teams with chargeback models — you need cost visibility at the tool level.
Key cost metrics to track:
- Response size per capability — Measure the content length of each tool’s response. Tools that return large datasets drive disproportionate token costs.
- Calls per agent per billing period — Essential for usage-based pricing. Track how many capability invocations each consumer makes.
- Cost-per-tool estimates — If you know the upstream compute cost of each tool (API calls, database queries, external service fees), attach a cost marker to each invocation. Even rough estimates enable meaningful budget conversations.
You don’t need a sophisticated metering system to start. Logging response content length and capability name for every call gives you enough data to build cost attribution dashboards in your existing analytics platform.
Two paths to MCP analytics
Once you know which signals you need, you have two ways to capture them: route traffic through a gateway that emits them automatically, or instrument your own MCP server route. Most teams converge on the first when they’re running more than one MCP server; the second is appropriate when you’re shipping a single MCP server and want full control of the pipeline.
Option A: MCP Gateway with built-in analytics
Zuplo’s MCP Gateway emits the typed events described above automatically. There’s no instrumentation to write: route MCP traffic through the gateway, federate any upstream MCP servers behind it, and the dashboard populates with:
- Three event families (Requests, Capabilities, Auth) plotted over time, broken down by event type (capability invoked, capability completed, capability failed, capability listed, request received, request completed, request rejected, downstream token validated, downstream token rejected, initialize negotiated).
- Top-line metrics — total events, success rate, p95 latency, failure origin count.
- Gateway vs upstream p95 latency plotted on the same chart, with per-event-family breakdowns.
- Top capabilities — per-capability call count, error count, error rate, and p95 latency, attributed to the virtual server they live on.
- Top virtual servers and top upstream servers — call volume and error rate per federated MCP server.
- Top users and top clients — traffic by tenant and by MCP client name (Claude Code, Cursor, MCP Inspector, custom).
- MCP method distribution —
tools/call,tools/list,initialize,prompts/list,resources/list,notifications/initialized,ping,notifications/cancelled,logging/setLevel, and so on, with raw counts. - JSON-RPC error codes — surfaced from the underlying JSON-RPC protocol so you can correlate a 401 at the HTTP layer with the actual MCP-level rejection.
- Failure origins — client, auth, upstream, gateway, none.
- Top reason codes —
missing_token,invalid_token,expired_token,revoked_token,upstream_error, and other documented problem codes.
The same events are also streamed out as structured logs. Every log row carries trace-ready metadata — tenant, MCP session, capability, latency, failure origin — ready to drop into Datadog, Honeycomb, or BigQuery. Every failure mode returns a documented problem code so MCP clients know exactly what went wrong and how to recover. No opaque 500s.
This is the right path if you’re federating multiple MCP servers (yours and third-party), need OAuth 2.0 across all of them, or just don’t want to maintain the analytics pipeline yourself.
Option B: Instrument your own MCP server route
If you’re shipping a single MCP server and prefer to own the analytics pipeline end to end, the gateway is still where you instrument. Every request flows through it, so you don’t need to modify your backend services or add SDK instrumentation to each tool implementation.
Here’s how to wire it up on an OpenAPI-driven gateway like Zuplo.
Per-key attribution
The foundation of MCP analytics is knowing who made each request. Without per-consumer identity, your metrics are just aggregates.
With Zuplo’s API Key Authentication policy, each agent or consumer authenticates with a unique API key. The gateway resolves the key to a consumer identity, and that identity is available in the request context for every downstream policy and log entry.
Configure the API key policy on your MCP server route:
Once authentication is in place, request.user.sub contains the consumer
identity. Every log entry, metric, and trace can be tagged with this value.
Log enrichment with custom policies
The real power of MCP analytics comes from parsing the MCP payload and emitting structured metadata. On a programmable gateway, you can write a TypeScript policy that inspects the incoming JSON-RPC request and attaches tool-level dimensions to the log context.
Here’s an example inbound policy that extracts MCP-specific fields:
This policy runs before the
MCP Server Handler processes the
request. Every log entry for this request — including those from downstream
policies — will carry the mcpMethod, mcpToolName, and agentId fields.
For outbound enrichment, add a response policy that captures latency and response size:
Exporting to your observability stack
With MCP-specific fields in your logs, you need to get them into your observability platform. Zuplo supports log integrations with Datadog, New Relic, Dynatrace, Splunk, Loki, Google Cloud Logging, AWS CloudWatch, and Sumo Logic. For distributed tracing, the OpenTelemetry plugin exports traces with full span instrumentation across the policy pipeline.
The custom log properties you set with context.log.setLogProperties are
included in every log entry sent to your configured logging provider. In
Datadog, for example, you can create facets on mcpToolName and agentId and
build dashboards that slice MCP traffic by those dimensions.
For metrics, Zuplo’s metrics plugins send request latency, request content length, and response content length to platforms like Datadog. Combined with the custom log properties, you can correlate latency spikes with specific capabilities and agents.
Alerting and SLOs for agent-driven traffic
Agent traffic behaves differently from human-driven API traffic. Agents don’t gradually ramp up — they burst. They don’t read error messages — they retry. They don’t stop at rate limits — they queue and hammer. Your alerting strategy needs to account for these patterns.
Per-tool SLOs
Define SLOs at the capability level, not the endpoint level. Suggested starting points:
- Lightweight tools (data lookups, status checks): p95 latency < 500ms, 99.5% success rate
- Medium tools (CRUD operations, simple transformations): p95 latency < 1s, 99% success rate
- Heavy tools (multi-step workflows, external API calls): p95 latency < 3s, 98% success rate
If you’re federating, set independent SLOs for gateway p95 (your overhead) and upstream p95 (the third-party service). The MCP Gateway dashboard reports both on the same chart, so a regression on one side jumps out immediately.
Alert on deviation, not threshold
Static threshold alerts are noisy for MCP traffic because agent usage patterns shift frequently. Instead, alert on deviation from baseline:
- Tool call volume 3x above its 7-day rolling average
- p95 latency 2x above its trailing 24-hour baseline (either gateway-side or upstream-side)
- Error rate for any single capability exceeding 5% over a 15-minute window
missing_tokenorinvalid_tokenreason codes spiking on a tenant that previously had clean auth (likely a credential rotation issue)- Any destructive capability (
delete_*,update_*) called by an unrecognized agent
Rate limiting as enforcement
Rate limiting is the enforcement counterpart to the metrics you’re measuring. Once you have per-tool and per-agent visibility, you can set intelligent limits:
For more granular control, use a custom rate-limiting function that sets different limits per capability or per consumer tier.
Worked example: an MCP analytics dashboard
Here’s what a practical MCP analytics dashboard looks like — modeled on the MCP Gateway dashboard so the panels mirror real production analytics, but applicable to any setup.
Panel 1: MCP events over time (time series)
A stacked area chart showing events per minute, broken down by event type across the three families: capability invoked, capability completed, capability failed, capability listed, request received, request completed, request rejected, downstream token validated, downstream token rejected, initialize negotiated. This is your primary traffic overview. You should be able to spot immediately which event types are active and whether any are spiking.
Panel 2: Latency — gateway vs upstream (split line chart)
A single chart with three series: total p95, gateway p95, upstream p95. When they move together, your gateway and the upstream are in lock-step; when they diverge, you can tell at a glance which side regressed. Pair this with a per- capability heatmap (p50 and p95 by capability over time) for the more granular view.
Panel 3: Top users and top clients (tables)
Side-by-side tables for tenant identity (who is paying for this traffic) and MCP client identity (Claude Code, Cursor, MCP Inspector, custom). Each row includes total events and error count, so integration issues surface immediately. Whether a single tenant is dominating, or a single client (e.g., unauthenticated MCP Inspector probes) is generating noise, both surface here.
Panel 4: Failure analysis (origin + reason code)
A two-level view. The top level breaks failures by origin: client, auth,
upstream, gateway, none. The second level surfaces the specific reason codes
behind them — missing_token, invalid_token, expired_token,
revoked_token, upstream_error. This is what tells you whose problem each
failure is. For example, a 90% concentration of missing_token reason codes on
the client origin typically means agents are integrated but haven’t completed
OAuth yet.
Panel 5: Method distribution and event families (donut charts)
Two donuts side by side. The first is the higher-level event families
breakdown (Requests, Capabilities, Auth). The second is the per-MCP-method
breakdown: tools/call, tools/list, initialize, prompts/list,
resources/list, notifications/initialized, ping,
notifications/cancelled, logging/setLevel. A healthy MCP server typically
shows a high ratio of tools/call to tools/list. If tools/list dominates,
agents may be re-discovering tools too frequently — a sign that caching or
session handling needs attention.
Panel 6: Top capabilities, virtual servers, and upstream servers (tables)
When you’re federating, the per-server view is essential: which virtual MCP servers are getting traffic, and which upstream MCP servers (Linear, GitHub, Stripe, Notion, internal ones) are answering for them. Three tables solve this: Top Capabilities (per-capability call count, error count, error rate, p95), Top Virtual Servers (the namespaces your customers connect to), and Top Upstream Servers (the third-party MCP servers you’re federating). A capability with a 50% error rate against a low-error upstream usually means a tool wiring problem; a capability with a low error rate fronting a high-error-rate upstream tells you the upstream is unstable.
Panel 7: Cost attribution (table)
A table showing estimated cost per capability per tenant over the selected time period. Columns: capability name, total calls, average response size, estimated cost. Sort by cost descending to identify the most expensive capability-tenant combinations.
Building this in practice
Every panel above can be built either by routing MCP traffic through the MCP
Gateway (no instrumentation needed) or from the structured logs your gateway
emits if you’re rolling your own. If you’re using Datadog, create facets on
mcpToolName, agentId, and mcpMethod. In Grafana with Loki, use LogQL label
filters on the same fields. The MCP Gateway emits all of these as structured
metadata by default; if you’re instrumenting yourself, your custom inbound
policy emits them.
Getting started: identity first, then signals
You have two starting points depending on whether you’re shipping one MCP server or federating many.
If you’re federating multiple MCP servers (yours and third-party), route them through MCP Gateway. The dashboard populates with the signals above as soon as traffic flows. From there, configure the log integration for Datadog, New Relic, Honeycomb, or BigQuery so the typed events stream into your existing observability stack.
If you’re shipping a single MCP server and want full control of the pipeline, start with the foundation:
- Add API key authentication to your MCP server route so you know who is making each call.
- Add the MCP analytics inbound policy to extract capability names and method types into structured logs.
- Configure a logging plugin to forward enriched logs to your observability platform.
- Build your first dashboard with tool call volume and agent breakdown.
- Add latency tracking and cost attribution once you understand your traffic patterns.
The gateway is where these signals converge — either way. It sees every MCP request before your tools execute, and every response before the agent receives it. Instrument it once (or let the MCP Gateway instrument it for you), and every capability you add automatically inherits the same observability.
Ready to add analytics to your MCP setup?
- Spin up MCP Gateway if you’re federating multiple MCP servers and want analytics out of the box — across three event families, with trace-ready structured logs that drop into Datadog, Honeycomb, or BigQuery.
- Deploy an MCP server if you’re building a single MCP server and want full control of the analytics pipeline.
Sign up for free and have your first instrumented MCP server running in minutes.