Zuplo
Model Context Protocol

What MCP Analytics Should You Track at the Gateway? Tool Usage, Latency-Per-Tool, and Prompt Patterns Explained

Nate TottenNate Totten
May 19, 2026
18 min read

Learn the essential MCP analytics signals — tool usage, latency-per-tool, prompt patterns, agent breakdown, and failure modes — captured by Zuplo's MCP Gateway dashboard or instrumented on your own MCP server.

You shipped an MCP server. Or you’re federating other people’s. Either way, AI agents can discover tools, call them, and get structured responses. The integration works. Now what?

The question every team hits next is the same one they’ve answered for REST APIs a hundred times: how do I know what’s happening? But for Model Context Protocol (MCP) traffic, the familiar metrics — requests per second, p99 latency, 5xx error rate — don’t tell the full story. MCP traffic is tool-shaped, agent-shaped, and prompt-shaped in ways that REST traffic simply isn’t.

This guide defines the metric vocabulary for MCP observability at the gateway layer. You’ll learn which signals matter, what they look like in practice on Zuplo’s MCP Gateway dashboard, and how to capture the same signals on your own MCP server route if you’re rolling instrumentation yourself.

Why MCP observability is different from REST observability

REST API observability is endpoint-shaped. You monitor GET /users/{id} and track its latency, error rate, and throughput. The URL path is the unit of analysis.

MCP observability is tool-shaped. A single MCP server route — typically POST /mcp — handles every interaction. All tool calls, prompt requests, and resource reads flow through that one endpoint. If you only track HTTP-level metrics, you see a single line on your dashboard: total requests to /mcp. That tells you almost nothing about what’s actually happening.

The meaningful dimensions live inside the JSON-RPC payloads:

  • Capability name — Which tool, prompt, or resource was called? search_products and delete_account have very different risk profiles and performance characteristics.
  • Method type — Was it tools/call, tools/list, prompts/list, resources/list, initialize, ping? Each method type has different latency expectations and failure semantics.
  • Agent identity — Which AI agent or MCP client initiated the request? Claude Code, a Cursor plugin, MCP Inspector, and a production agentic workflow have different traffic patterns and SLA expectations.
  • Prompt parameters — Which prompt template was invoked, and with what argument shapes? Prompt distribution reveals how agents are actually using your server.

Traditional API monitoring tools — Datadog, Grafana, New Relic — can absolutely handle MCP data. But they need the gateway layer to parse MCP payloads and emit these tool-level dimensions as structured metadata. Without that enrichment, your dashboards show HTTP-level metrics that miss the point.

What MCP analytics look like in practice

Before drilling into individual signals, it helps to see what a finished MCP analytics dashboard looks like. Zuplo’s MCP Gateway ships one in the Account Analytics section, and its structure is broadly applicable to any MCP analytics setup.

A real MCP dashboard answers four questions immediately:

  • Are calls succeeding? Success rate, total events, total failures.
  • Are calls fast? p95 latency, split into the part the gateway controls and the part the upstream controls.
  • Where are failures originating? Client (bad request, missing token), auth (rejected credential), upstream (third-party service returned an error), or gateway (your own infrastructure).
  • What’s actually being called? Top capabilities (tools, prompts, resources) by call count, error rate, and p95, plus the breakdown by MCP method (tools/call, tools/list, initialize, prompts/list, resources/list, notifications/initialized, ping, and so on).

The three event families

Zuplo’s MCP Gateway emits typed events that fall into three families, and the same three families are a useful organizing principle even if you’re rolling your own:

  • Requests — The MCP request lifecycle: a request received, validated, completed, or rejected. This is your traffic volume and success-rate signal.
  • Capabilities — Capability invocations (tool, prompt, or resource): listed, invoked, completed, or failed. This is your “what is the agent actually doing” signal.
  • Auth — The full upstream OAuth flow: tokens validated, rejected, refreshed, or revoked. This is the signal that catches “agents are integrated but credentials are wrong.”

A healthy MCP server typically shows Requests as the bulk of events (around three-quarters), with Capabilities and Auth making up the rest. If Auth dominates, you have a credential problem. If Capabilities is unusually low relative to Requests, agents are connecting but not actually invoking tools.

Failure origins, not just failure rates

A 4% error rate on tools/call tells you something is wrong. A 4% error rate broken down by origin tells you who needs to fix it:

  • Client — The agent sent malformed arguments, a missing token, or an invalid request. The client owns the fix.
  • Auth — The auth layer rejected the credential (invalid token, expired token, revoked token, missing scope). Either the agent’s credential needs renewal, or you have a misconfigured access policy.
  • Upstream — The third-party MCP server or backend service returned an error. The upstream owns the fix; you can only retry or fall back.
  • Gateway — Your own gateway returned the error. You own the fix.

Documented problem codes attached to each error — missing_token, invalid_token, expired_token, revoked_token, upstream_error — let MCP clients recover automatically rather than retrying blindly.

Gateway vs upstream latency

The single most important latency view for federated MCP traffic is gateway p95 vs upstream p95, plotted on the same chart. The gateway is your overhead (typically single-digit milliseconds); the upstream is everything you don’t control. Splitting the two lets you set independent SLOs and immediately tell which side caused a latency regression.

If your gateway p95 is creeping up, you have an instrumentation or routing problem. If your upstream p95 spikes, it’s a third-party issue. Without the split, every latency conversation devolves into “is it us or them?”

The five signals to instrument

These are the five core signals that give you meaningful visibility into MCP traffic. They’re ordered from most immediately useful to most strategically valuable.

1. Tool call volume

What it measures: How many times each capability is invoked over a given time window.

Why it matters: Tool call volume is the MCP equivalent of requests-per-second by endpoint. It tells you which capabilities are hot, which are unused, and whether traffic patterns match your expectations. A sudden spike in delete_user calls is a very different signal than a spike in search_products.

How to capture it: Log the method and capability name from each tools/call, prompts/get, or resources/read request. Aggregate by capability name per time bucket (1-minute or 5-minute windows work well for most workloads).

What to watch for:

  • Capabilities with zero calls over 7+ days (candidates for deprecation)
  • Unexpected volume spikes on destructive tools
  • Skewed distribution where one tool accounts for 80%+ of all calls

2. Latency per tool

What it measures: Response time for each capability individually, not aggregated across the MCP endpoint.

Why it matters: Aggregate latency for POST /mcp is meaningless when your server exposes 15 capabilities. A get_status tool that returns cached data in 20ms and a generate_report tool that calls three upstream APIs in 1200ms will average out to a number that describes neither tool accurately.

How to capture it: Measure the time from when the gateway receives the tools/call request to when it sends the response, broken down by capability name. Track p50, p95, and p99 percentiles. If you’re federating, split each measurement into gateway-side latency and upstream-side latency so you can tell who owns a regression.

What to watch for:

  • Individual capabilities degrading while the aggregate looks healthy
  • Latency variance (a tool that’s usually 50ms but occasionally spikes to 5s)
  • Correlation between latency spikes and specific agent consumers
  • Gateway p95 staying flat while upstream p95 climbs (third-party degradation)

3. Prompt-shape distribution

What it measures: Which prompt templates agents request and how frequently, along with the argument patterns they use.

Why it matters: MCP servers can expose prompts — predefined templates that agents use to structure their interactions. Tracking which prompts are popular, which are ignored, and what argument combinations agents pass reveals how your server is actually being used versus how you designed it to be used.

How to capture it: Log the prompt name from prompts/get requests and the prompt count from prompts/list. For deeper analysis, log the argument keys (not values — those may contain sensitive data) to understand usage patterns.

What to watch for:

  • Prompts with zero usage (are they discoverable?)
  • Unexpected argument combinations (agents using prompts in ways you didn’t intend)
  • Prompt popularity shifts over time (as agents learn and adapt)

4. Agent identity breakdown

What it measures: MCP traffic segmented by the AI agent, MCP client, or consumer that initiated it.

Why it matters: Without agent-level breakdown, you can’t answer basic operations questions: Which agent is responsible for this traffic spike? Is Agent A’s error rate higher than Agent B’s? Which customer’s agent is consuming the most resources?

Agent identity is the precondition for everything else — billing, abuse detection, SLA enforcement, and capacity planning all depend on knowing who is making the calls.

How to capture it: Two layers. Client identity comes from the MCP client identifier (Claude Code, Cursor, MCP Inspector, custom agents) reported during initialize. Tenant identity comes from authentication — authenticated MCP traffic on Zuplo’s MCP Gateway is tied to the OAuth tenant; on your own MCP server route, require API key authentication so the gateway resolves the key to a consumer identity.

What to watch for:

  • Single tenants dominating traffic (noisy neighbor problem)
  • Tenants with abnormally high error rates (integration issues)
  • New clients appearing without corresponding onboarding (unauthorized access)
  • A single MCP client (e.g., MCP Inspector probes) dominating low-value traffic

5. Failure modes

What it measures: Why MCP calls fail, categorized by failure type and failure origin.

Why it matters: A 4% error rate on tools/call tells you something is wrong. But MCP failures have distinct categories that require different responses — and crucially, distinct origins that tell you who owns the fix:

  • Validation errors (client origin) — The agent sent malformed arguments. This is a client problem, not a server problem.
  • Missing or invalid token (auth origin) — The agent’s credential is missing, expired, or revoked. Surfaced via documented reason codes like missing_token, invalid_token, expired_token, revoked_token.
  • Upstream timeouts or errors (upstream origin) — A federated MCP server or its backend dependency is slow or failing. Surfaced as upstream_error.
  • Permission denials (auth origin) — The agent’s token doesn’t have access to this capability. Either a configuration issue or an attempted privilege escalation.
  • Tool not found (client or gateway origin) — The agent requested a capability that doesn’t exist, possibly because it’s caching a stale tool list.
  • Rate limit exceeded (gateway origin) — The agent hit its quota.

How to capture it: Log the error type, failure origin, and reason code alongside the capability name and agent identity for every failed request.

What to watch for:

  • Rising validation errors after a schema change (did you break a contract?)
  • upstream_error clusters on a specific virtual server (upstream dependency degrading)
  • missing_token spikes from a single tenant (misconfiguration or probing)
  • Auth failures dominating overall errors (likely a credential rollout problem, not a code problem)

Cost signals: token consumption and per-tool spend

Beyond operational signals, MCP traffic carries cost implications that REST APIs typically don’t. When AI agents call your tools, the responses they receive influence token consumption in the LLM context window. Larger responses mean more tokens, and more tokens mean higher costs for the agent operator.

If you’re providing MCP tools to external consumers — or even internal teams with chargeback models — you need cost visibility at the tool level.

Key cost metrics to track:

  • Response size per capability — Measure the content length of each tool’s response. Tools that return large datasets drive disproportionate token costs.
  • Calls per agent per billing period — Essential for usage-based pricing. Track how many capability invocations each consumer makes.
  • Cost-per-tool estimates — If you know the upstream compute cost of each tool (API calls, database queries, external service fees), attach a cost marker to each invocation. Even rough estimates enable meaningful budget conversations.

You don’t need a sophisticated metering system to start. Logging response content length and capability name for every call gives you enough data to build cost attribution dashboards in your existing analytics platform.

Two paths to MCP analytics

Once you know which signals you need, you have two ways to capture them: route traffic through a gateway that emits them automatically, or instrument your own MCP server route. Most teams converge on the first when they’re running more than one MCP server; the second is appropriate when you’re shipping a single MCP server and want full control of the pipeline.

Option A: MCP Gateway with built-in analytics

Zuplo’s MCP Gateway emits the typed events described above automatically. There’s no instrumentation to write: route MCP traffic through the gateway, federate any upstream MCP servers behind it, and the dashboard populates with:

  • Three event families (Requests, Capabilities, Auth) plotted over time, broken down by event type (capability invoked, capability completed, capability failed, capability listed, request received, request completed, request rejected, downstream token validated, downstream token rejected, initialize negotiated).
  • Top-line metrics — total events, success rate, p95 latency, failure origin count.
  • Gateway vs upstream p95 latency plotted on the same chart, with per-event-family breakdowns.
  • Top capabilities — per-capability call count, error count, error rate, and p95 latency, attributed to the virtual server they live on.
  • Top virtual servers and top upstream servers — call volume and error rate per federated MCP server.
  • Top users and top clients — traffic by tenant and by MCP client name (Claude Code, Cursor, MCP Inspector, custom).
  • MCP method distributiontools/call, tools/list, initialize, prompts/list, resources/list, notifications/initialized, ping, notifications/cancelled, logging/setLevel, and so on, with raw counts.
  • JSON-RPC error codes — surfaced from the underlying JSON-RPC protocol so you can correlate a 401 at the HTTP layer with the actual MCP-level rejection.
  • Failure origins — client, auth, upstream, gateway, none.
  • Top reason codesmissing_token, invalid_token, expired_token, revoked_token, upstream_error, and other documented problem codes.

The same events are also streamed out as structured logs. Every log row carries trace-ready metadata — tenant, MCP session, capability, latency, failure origin — ready to drop into Datadog, Honeycomb, or BigQuery. Every failure mode returns a documented problem code so MCP clients know exactly what went wrong and how to recover. No opaque 500s.

This is the right path if you’re federating multiple MCP servers (yours and third-party), need OAuth 2.0 across all of them, or just don’t want to maintain the analytics pipeline yourself.

Option B: Instrument your own MCP server route

If you’re shipping a single MCP server and prefer to own the analytics pipeline end to end, the gateway is still where you instrument. Every request flows through it, so you don’t need to modify your backend services or add SDK instrumentation to each tool implementation.

Here’s how to wire it up on an OpenAPI-driven gateway like Zuplo.

Per-key attribution

The foundation of MCP analytics is knowing who made each request. Without per-consumer identity, your metrics are just aggregates.

With Zuplo’s API Key Authentication policy, each agent or consumer authenticates with a unique API key. The gateway resolves the key to a consumer identity, and that identity is available in the request context for every downstream policy and log entry.

Configure the API key policy on your MCP server route:

JSONjson
{
  "name": "mcp-api-key-auth",
  "policyType": "api-key-inbound",
  "handler": {
    "export": "ApiKeyInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {}
  }
}

Once authentication is in place, request.user.sub contains the consumer identity. Every log entry, metric, and trace can be tagged with this value.

Log enrichment with custom policies

The real power of MCP analytics comes from parsing the MCP payload and emitting structured metadata. On a programmable gateway, you can write a TypeScript policy that inspects the incoming JSON-RPC request and attaches tool-level dimensions to the log context.

Here’s an example inbound policy that extracts MCP-specific fields:

TypeScripttypescript
import type { ZuploContext, ZuploRequest } from "@zuplo/runtime";

export default async function mcpAnalyticsPolicy(
  request: ZuploRequest,
  context: ZuploContext,
) {
  // Clone the request so the body remains available downstream
  const clone = request.clone();
  const body = await clone.json();

  // Extract MCP-specific fields from the JSON-RPC payload
  const mcpMethod = body?.method; // e.g., "tools/call", "tools/list"
  const toolName = body?.params?.name; // e.g., "search_products"
  const agentId = request.user?.sub ?? "anonymous";

  // Attach as structured log properties for all subsequent log entries
  context.log.setLogProperties!({
    mcpMethod,
    mcpToolName: toolName ?? "none",
    agentId,
  });

  context.log.info(
    {
      mcpMethod,
      tool: toolName,
      agent: agentId,
    },
    "MCP tool invocation",
  );

  return request;
}

This policy runs before the MCP Server Handler processes the request. Every log entry for this request — including those from downstream policies — will carry the mcpMethod, mcpToolName, and agentId fields.

For outbound enrichment, add a response policy that captures latency and response size:

TypeScripttypescript
import type { ZuploContext, ZuploRequest, ZuploResponse } from "@zuplo/runtime";

export default async function mcpResponseAnalytics(
  response: ZuploResponse,
  request: ZuploRequest,
  context: ZuploContext,
) {
  const responseSize = response.headers.get("content-length") ?? "unknown";

  context.log.info(
    {
      status: response.status,
      responseSize,
    },
    "MCP tool response",
  );

  return response;
}

Exporting to your observability stack

With MCP-specific fields in your logs, you need to get them into your observability platform. Zuplo supports log integrations with Datadog, New Relic, Dynatrace, Splunk, Loki, Google Cloud Logging, AWS CloudWatch, and Sumo Logic. For distributed tracing, the OpenTelemetry plugin exports traces with full span instrumentation across the policy pipeline.

The custom log properties you set with context.log.setLogProperties are included in every log entry sent to your configured logging provider. In Datadog, for example, you can create facets on mcpToolName and agentId and build dashboards that slice MCP traffic by those dimensions.

For metrics, Zuplo’s metrics plugins send request latency, request content length, and response content length to platforms like Datadog. Combined with the custom log properties, you can correlate latency spikes with specific capabilities and agents.

Alerting and SLOs for agent-driven traffic

Agent traffic behaves differently from human-driven API traffic. Agents don’t gradually ramp up — they burst. They don’t read error messages — they retry. They don’t stop at rate limits — they queue and hammer. Your alerting strategy needs to account for these patterns.

Per-tool SLOs

Define SLOs at the capability level, not the endpoint level. Suggested starting points:

  • Lightweight tools (data lookups, status checks): p95 latency < 500ms, 99.5% success rate
  • Medium tools (CRUD operations, simple transformations): p95 latency < 1s, 99% success rate
  • Heavy tools (multi-step workflows, external API calls): p95 latency < 3s, 98% success rate

If you’re federating, set independent SLOs for gateway p95 (your overhead) and upstream p95 (the third-party service). The MCP Gateway dashboard reports both on the same chart, so a regression on one side jumps out immediately.

Alert on deviation, not threshold

Static threshold alerts are noisy for MCP traffic because agent usage patterns shift frequently. Instead, alert on deviation from baseline:

  • Tool call volume 3x above its 7-day rolling average
  • p95 latency 2x above its trailing 24-hour baseline (either gateway-side or upstream-side)
  • Error rate for any single capability exceeding 5% over a 15-minute window
  • missing_token or invalid_token reason codes spiking on a tenant that previously had clean auth (likely a credential rotation issue)
  • Any destructive capability (delete_*, update_*) called by an unrecognized agent

Rate limiting as enforcement

Rate limiting is the enforcement counterpart to the metrics you’re measuring. Once you have per-tool and per-agent visibility, you can set intelligent limits:

JSONjson
{
  "name": "mcp-rate-limit",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 100,
      "timeWindowMinutes": 1
    }
  }
}

For more granular control, use a custom rate-limiting function that sets different limits per capability or per consumer tier.

Worked example: an MCP analytics dashboard

Here’s what a practical MCP analytics dashboard looks like — modeled on the MCP Gateway dashboard so the panels mirror real production analytics, but applicable to any setup.

Panel 1: MCP events over time (time series)

A stacked area chart showing events per minute, broken down by event type across the three families: capability invoked, capability completed, capability failed, capability listed, request received, request completed, request rejected, downstream token validated, downstream token rejected, initialize negotiated. This is your primary traffic overview. You should be able to spot immediately which event types are active and whether any are spiking.

Panel 2: Latency — gateway vs upstream (split line chart)

A single chart with three series: total p95, gateway p95, upstream p95. When they move together, your gateway and the upstream are in lock-step; when they diverge, you can tell at a glance which side regressed. Pair this with a per- capability heatmap (p50 and p95 by capability over time) for the more granular view.

Panel 3: Top users and top clients (tables)

Side-by-side tables for tenant identity (who is paying for this traffic) and MCP client identity (Claude Code, Cursor, MCP Inspector, custom). Each row includes total events and error count, so integration issues surface immediately. Whether a single tenant is dominating, or a single client (e.g., unauthenticated MCP Inspector probes) is generating noise, both surface here.

Panel 4: Failure analysis (origin + reason code)

A two-level view. The top level breaks failures by origin: client, auth, upstream, gateway, none. The second level surfaces the specific reason codes behind them — missing_token, invalid_token, expired_token, revoked_token, upstream_error. This is what tells you whose problem each failure is. For example, a 90% concentration of missing_token reason codes on the client origin typically means agents are integrated but haven’t completed OAuth yet.

Panel 5: Method distribution and event families (donut charts)

Two donuts side by side. The first is the higher-level event families breakdown (Requests, Capabilities, Auth). The second is the per-MCP-method breakdown: tools/call, tools/list, initialize, prompts/list, resources/list, notifications/initialized, ping, notifications/cancelled, logging/setLevel. A healthy MCP server typically shows a high ratio of tools/call to tools/list. If tools/list dominates, agents may be re-discovering tools too frequently — a sign that caching or session handling needs attention.

Panel 6: Top capabilities, virtual servers, and upstream servers (tables)

When you’re federating, the per-server view is essential: which virtual MCP servers are getting traffic, and which upstream MCP servers (Linear, GitHub, Stripe, Notion, internal ones) are answering for them. Three tables solve this: Top Capabilities (per-capability call count, error count, error rate, p95), Top Virtual Servers (the namespaces your customers connect to), and Top Upstream Servers (the third-party MCP servers you’re federating). A capability with a 50% error rate against a low-error upstream usually means a tool wiring problem; a capability with a low error rate fronting a high-error-rate upstream tells you the upstream is unstable.

Panel 7: Cost attribution (table)

A table showing estimated cost per capability per tenant over the selected time period. Columns: capability name, total calls, average response size, estimated cost. Sort by cost descending to identify the most expensive capability-tenant combinations.

Building this in practice

Every panel above can be built either by routing MCP traffic through the MCP Gateway (no instrumentation needed) or from the structured logs your gateway emits if you’re rolling your own. If you’re using Datadog, create facets on mcpToolName, agentId, and mcpMethod. In Grafana with Loki, use LogQL label filters on the same fields. The MCP Gateway emits all of these as structured metadata by default; if you’re instrumenting yourself, your custom inbound policy emits them.

Getting started: identity first, then signals

You have two starting points depending on whether you’re shipping one MCP server or federating many.

If you’re federating multiple MCP servers (yours and third-party), route them through MCP Gateway. The dashboard populates with the signals above as soon as traffic flows. From there, configure the log integration for Datadog, New Relic, Honeycomb, or BigQuery so the typed events stream into your existing observability stack.

If you’re shipping a single MCP server and want full control of the pipeline, start with the foundation:

  1. Add API key authentication to your MCP server route so you know who is making each call.
  2. Add the MCP analytics inbound policy to extract capability names and method types into structured logs.
  3. Configure a logging plugin to forward enriched logs to your observability platform.
  4. Build your first dashboard with tool call volume and agent breakdown.
  5. Add latency tracking and cost attribution once you understand your traffic patterns.

The gateway is where these signals converge — either way. It sees every MCP request before your tools execute, and every response before the agent receives it. Instrument it once (or let the MCP Gateway instrument it for you), and every capability you add automatically inherits the same observability.


Ready to add analytics to your MCP setup?

  • Spin up MCP Gateway if you’re federating multiple MCP servers and want analytics out of the box — across three event families, with trace-ready structured logs that drop into Datadog, Honeycomb, or BigQuery.
  • Deploy an MCP server if you’re building a single MCP server and want full control of the analytics pipeline.

Sign up for free and have your first instrumented MCP server running in minutes.