---
title: "What MCP Analytics Should You Track at the Gateway? Tool Usage, Latency-Per-Tool, and Prompt Patterns Explained"
description: "Learn the essential MCP analytics signals — tool usage, latency-per-tool, prompt patterns, agent breakdown, and failure modes — captured by Zuplo's MCP Gateway dashboard or instrumented on your own MCP server."
canonicalUrl: "https://zuplo.com/learning-center/mcp-analytics-at-the-gateway"
pageType: "learning-center"
authors: "nate"
tags: "Model Context Protocol, API Analytics"
image: "https://zuplo.com/og?text=What%20MCP%20Analytics%20Should%20You%20Track%20at%20the%20Gateway"
---
You shipped an MCP server. Or you're federating other people's. Either way, AI
agents can discover tools, call them, and get structured responses. The
integration works. Now what?

The question every team hits next is the same one they've answered for REST APIs
a hundred times: _how do I know what's happening?_ But for
[Model Context Protocol (MCP)](https://modelcontextprotocol.io) traffic, the
familiar metrics — requests per second, p99 latency, 5xx error rate — don't tell
the full story. MCP traffic is tool-shaped, agent-shaped, and prompt-shaped in
ways that REST traffic simply isn't.

This guide defines the metric vocabulary for MCP observability at the gateway
layer. You'll learn which signals matter, what they look like in practice on
Zuplo's [MCP Gateway](/mcp-gateway) dashboard, and how to capture the same
signals on your own MCP server route if you're rolling instrumentation yourself.

- [Why MCP observability is different from REST observability](#why-mcp-observability-is-different-from-rest-observability)
- [What MCP analytics look like in practice](#what-mcp-analytics-look-like-in-practice)
  - [The three event families](#the-three-event-families)
  - [Failure origins, not just failure rates](#failure-origins-not-just-failure-rates)
  - [Gateway vs upstream latency](#gateway-vs-upstream-latency)
- [The five signals to instrument](#the-five-signals-to-instrument)
  - [Tool call volume](#1-tool-call-volume)
  - [Latency per tool](#2-latency-per-tool)
  - [Prompt-shape distribution](#3-prompt-shape-distribution)
  - [Agent identity breakdown](#4-agent-identity-breakdown)
  - [Failure modes](#5-failure-modes)
- [Cost signals: token consumption and per-tool spend](#cost-signals-token-consumption-and-per-tool-spend)
- [Two paths to MCP analytics](#two-paths-to-mcp-analytics)
  - [Option A: MCP Gateway with built-in analytics](#option-a-mcp-gateway-with-built-in-analytics)
  - [Option B: Instrument your own MCP server route](#option-b-instrument-your-own-mcp-server-route)
- [Alerting and SLOs for agent-driven traffic](#alerting-and-slos-for-agent-driven-traffic)
- [Worked example: an MCP analytics dashboard](#worked-example-an-mcp-analytics-dashboard)
- [Getting started: identity first, then signals](#getting-started-identity-first-then-signals)

## Why MCP observability is different from REST observability

REST API observability is endpoint-shaped. You monitor `GET /users/{id}` and
track its latency, error rate, and throughput. The URL path _is_ the unit of
analysis.

MCP observability is tool-shaped. A single MCP server route — typically
`POST /mcp` — handles every interaction. All tool calls, prompt requests, and
resource reads flow through that one endpoint. If you only track HTTP-level
metrics, you see a single line on your dashboard: total requests to `/mcp`. That
tells you almost nothing about what's actually happening.

The meaningful dimensions live inside the JSON-RPC payloads:

- **Capability name** — Which tool, prompt, or resource was called?
  `search_products` and `delete_account` have very different risk profiles and
  performance characteristics.
- **Method type** — Was it `tools/call`, `tools/list`, `prompts/list`,
  `resources/list`, `initialize`, `ping`? Each method type has different latency
  expectations and failure semantics.
- **Agent identity** — Which AI agent or MCP client initiated the request?
  Claude Code, a Cursor plugin, MCP Inspector, and a production agentic workflow
  have different traffic patterns and SLA expectations.
- **Prompt parameters** — Which prompt template was invoked, and with what
  argument shapes? Prompt distribution reveals how agents are actually using
  your server.

Traditional API monitoring tools — Datadog, Grafana, New Relic — can absolutely
handle MCP data. But they need the gateway layer to parse MCP payloads and emit
these tool-level dimensions as structured metadata. Without that enrichment,
your dashboards show HTTP-level metrics that miss the point.

## What MCP analytics look like in practice

Before drilling into individual signals, it helps to see what a finished MCP
analytics dashboard looks like. Zuplo's MCP Gateway ships one in the Account
Analytics section, and its structure is broadly applicable to any MCP analytics
setup.

A real MCP dashboard answers four questions immediately:

- **Are calls succeeding?** Success rate, total events, total failures.
- **Are calls fast?** p95 latency, split into the part the gateway controls and
  the part the upstream controls.
- **Where are failures originating?** Client (bad request, missing token), auth
  (rejected credential), upstream (third-party service returned an error), or
  gateway (your own infrastructure).
- **What's actually being called?** Top capabilities (tools, prompts, resources)
  by call count, error rate, and p95, plus the breakdown by MCP method
  (`tools/call`, `tools/list`, `initialize`, `prompts/list`, `resources/list`,
  `notifications/initialized`, `ping`, and so on).

### The three event families

Zuplo's MCP Gateway emits typed events that fall into three families, and the
same three families are a useful organizing principle even if you're rolling
your own:

- **Requests** — The MCP request lifecycle: a request received, validated,
  completed, or rejected. This is your traffic volume and success-rate signal.
- **Capabilities** — Capability invocations (tool, prompt, or resource): listed,
  invoked, completed, or failed. This is your "what is the agent actually doing"
  signal.
- **Auth** — The full upstream OAuth flow: tokens validated, rejected,
  refreshed, or revoked. This is the signal that catches "agents are integrated
  but credentials are wrong."

A healthy MCP server typically shows Requests as the bulk of events (around
three-quarters), with Capabilities and Auth making up the rest. If Auth
dominates, you have a credential problem. If Capabilities is unusually low
relative to Requests, agents are connecting but not actually invoking tools.

### Failure origins, not just failure rates

A 4% error rate on `tools/call` tells you something is wrong. A 4% error rate
broken down by **origin** tells you who needs to fix it:

- **Client** — The agent sent malformed arguments, a missing token, or an
  invalid request. The client owns the fix.
- **Auth** — The auth layer rejected the credential (invalid token, expired
  token, revoked token, missing scope). Either the agent's credential needs
  renewal, or you have a misconfigured access policy.
- **Upstream** — The third-party MCP server or backend service returned an
  error. The upstream owns the fix; you can only retry or fall back.
- **Gateway** — Your own gateway returned the error. You own the fix.

Documented problem codes attached to each error — `missing_token`,
`invalid_token`, `expired_token`, `revoked_token`, `upstream_error` — let MCP
clients recover automatically rather than retrying blindly.

### Gateway vs upstream latency

The single most important latency view for federated MCP traffic is **gateway
p95 vs upstream p95**, plotted on the same chart. The gateway is your overhead
(typically single-digit milliseconds); the upstream is everything you don't
control. Splitting the two lets you set independent SLOs and immediately tell
which side caused a latency regression.

If your gateway p95 is creeping up, you have an instrumentation or routing
problem. If your upstream p95 spikes, it's a third-party issue. Without the
split, every latency conversation devolves into "is it us or them?"

## The five signals to instrument

These are the five core signals that give you meaningful visibility into MCP
traffic. They're ordered from most immediately useful to most strategically
valuable.

### 1. Tool call volume

**What it measures:** How many times each capability is invoked over a given
time window.

**Why it matters:** Tool call volume is the MCP equivalent of
requests-per-second by endpoint. It tells you which capabilities are hot, which
are unused, and whether traffic patterns match your expectations. A sudden spike
in `delete_user` calls is a very different signal than a spike in
`search_products`.

**How to capture it:** Log the `method` and capability `name` from each
`tools/call`, `prompts/get`, or `resources/read` request. Aggregate by
capability name per time bucket (1-minute or 5-minute windows work well for most
workloads).

**What to watch for:**

- Capabilities with zero calls over 7+ days (candidates for deprecation)
- Unexpected volume spikes on destructive tools
- Skewed distribution where one tool accounts for 80%+ of all calls

### 2. Latency per tool

**What it measures:** Response time for each capability individually, not
aggregated across the MCP endpoint.

**Why it matters:** Aggregate latency for `POST /mcp` is meaningless when your
server exposes 15 capabilities. A `get_status` tool that returns cached data in
20ms and a `generate_report` tool that calls three upstream APIs in 1200ms will
average out to a number that describes neither tool accurately.

**How to capture it:** Measure the time from when the gateway receives the
`tools/call` request to when it sends the response, broken down by capability
name. Track p50, p95, and p99 percentiles. If you're federating, split each
measurement into gateway-side latency and upstream-side latency so you can tell
who owns a regression.

**What to watch for:**

- Individual capabilities degrading while the aggregate looks healthy
- Latency variance (a tool that's usually 50ms but occasionally spikes to 5s)
- Correlation between latency spikes and specific agent consumers
- Gateway p95 staying flat while upstream p95 climbs (third-party degradation)

### 3. Prompt-shape distribution

**What it measures:** Which prompt templates agents request and how frequently,
along with the argument patterns they use.

**Why it matters:** MCP servers can expose
[prompts](https://zuplo.com/docs/mcp-server/prompts) — predefined templates that
agents use to structure their interactions. Tracking which prompts are popular,
which are ignored, and what argument combinations agents pass reveals how your
server is actually being used versus how you designed it to be used.

**How to capture it:** Log the prompt `name` from `prompts/get` requests and the
prompt count from `prompts/list`. For deeper analysis, log the argument keys
(not values — those may contain sensitive data) to understand usage patterns.

**What to watch for:**

- Prompts with zero usage (are they discoverable?)
- Unexpected argument combinations (agents using prompts in ways you didn't
  intend)
- Prompt popularity shifts over time (as agents learn and adapt)

### 4. Agent identity breakdown

**What it measures:** MCP traffic segmented by the AI agent, MCP client, or
consumer that initiated it.

**Why it matters:** Without agent-level breakdown, you can't answer basic
operations questions: _Which agent is responsible for this traffic spike? Is
Agent A's error rate higher than Agent B's? Which customer's agent is consuming
the most resources?_

Agent identity is the precondition for everything else — billing, abuse
detection, SLA enforcement, and capacity planning all depend on knowing who is
making the calls.

**How to capture it:** Two layers. **Client identity** comes from the MCP client
identifier (Claude Code, Cursor, MCP Inspector, custom agents) reported during
`initialize`. **Tenant identity** comes from authentication — authenticated MCP
traffic on Zuplo's MCP Gateway is tied to the OAuth tenant; on your own MCP
server route, require API key authentication so the gateway resolves the key to
a consumer identity.

**What to watch for:**

- Single tenants dominating traffic (noisy neighbor problem)
- Tenants with abnormally high error rates (integration issues)
- New clients appearing without corresponding onboarding (unauthorized access)
- A single MCP client (e.g., MCP Inspector probes) dominating low-value traffic

### 5. Failure modes

**What it measures:** Why MCP calls fail, categorized by failure type and
failure origin.

**Why it matters:** A 4% error rate on `tools/call` tells you something is
wrong. But MCP failures have distinct categories that require different
responses — and crucially, distinct origins that tell you who owns the fix:

- **Validation errors (client origin)** — The agent sent malformed arguments.
  This is a client problem, not a server problem.
- **Missing or invalid token (auth origin)** — The agent's credential is
  missing, expired, or revoked. Surfaced via documented reason codes like
  `missing_token`, `invalid_token`, `expired_token`, `revoked_token`.
- **Upstream timeouts or errors (upstream origin)** — A federated MCP server or
  its backend dependency is slow or failing. Surfaced as `upstream_error`.
- **Permission denials (auth origin)** — The agent's token doesn't have access
  to this capability. Either a configuration issue or an attempted privilege
  escalation.
- **Tool not found (client or gateway origin)** — The agent requested a
  capability that doesn't exist, possibly because it's caching a stale tool
  list.
- **Rate limit exceeded (gateway origin)** — The agent hit its quota.

**How to capture it:** Log the error type, failure origin, and reason code
alongside the capability name and agent identity for every failed request.

**What to watch for:**

- Rising validation errors after a schema change (did you break a contract?)
- `upstream_error` clusters on a specific virtual server (upstream dependency
  degrading)
- `missing_token` spikes from a single tenant (misconfiguration or probing)
- Auth failures dominating overall errors (likely a credential rollout problem,
  not a code problem)

## Cost signals: token consumption and per-tool spend

Beyond operational signals, MCP traffic carries cost implications that REST APIs
typically don't. When AI agents call your tools, the responses they receive
influence token consumption in the LLM context window. Larger responses mean
more tokens, and more tokens mean higher costs for the agent operator.

If you're providing MCP tools to external consumers — or even internal teams
with chargeback models — you need cost visibility at the tool level.

**Key cost metrics to track:**

- **Response size per capability** — Measure the content length of each tool's
  response. Tools that return large datasets drive disproportionate token costs.
- **Calls per agent per billing period** — Essential for usage-based pricing.
  Track how many capability invocations each consumer makes.
- **Cost-per-tool estimates** — If you know the upstream compute cost of each
  tool (API calls, database queries, external service fees), attach a cost
  marker to each invocation. Even rough estimates enable meaningful budget
  conversations.

You don't need a sophisticated metering system to start. Logging response
content length and capability name for every call gives you enough data to build
cost attribution dashboards in your existing analytics platform.

## Two paths to MCP analytics

Once you know which signals you need, you have two ways to capture them: route
traffic through a gateway that emits them automatically, or instrument your own
MCP server route. Most teams converge on the first when they're running more
than one MCP server; the second is appropriate when you're shipping a single MCP
server and want full control of the pipeline.

### Option A: MCP Gateway with built-in analytics

Zuplo's [MCP Gateway](/mcp-gateway) emits the typed events described above
automatically. There's no instrumentation to write: route MCP traffic through
the gateway, federate any upstream MCP servers behind it, and the dashboard
populates with:

- **Three event families** (Requests, Capabilities, Auth) plotted over time,
  broken down by event type (capability invoked, capability completed,
  capability failed, capability listed, request received, request completed,
  request rejected, downstream token validated, downstream token rejected,
  initialize negotiated).
- **Top-line metrics** — total events, success rate, p95 latency, failure origin
  count.
- **Gateway vs upstream p95 latency** plotted on the same chart, with
  per-event-family breakdowns.
- **Top capabilities** — per-capability call count, error count, error rate, and
  p95 latency, attributed to the virtual server they live on.
- **Top virtual servers and top upstream servers** — call volume and error rate
  per federated MCP server.
- **Top users and top clients** — traffic by tenant and by MCP client name
  (Claude Code, Cursor, MCP Inspector, custom).
- **MCP method distribution** — `tools/call`, `tools/list`, `initialize`,
  `prompts/list`, `resources/list`, `notifications/initialized`, `ping`,
  `notifications/cancelled`, `logging/setLevel`, and so on, with raw counts.
- **JSON-RPC error codes** — surfaced from the underlying JSON-RPC protocol so
  you can correlate a 401 at the HTTP layer with the actual MCP-level rejection.
- **Failure origins** — client, auth, upstream, gateway, none.
- **Top reason codes** — `missing_token`, `invalid_token`, `expired_token`,
  `revoked_token`, `upstream_error`, and other documented problem codes.

The same events are also streamed out as
[structured logs](https://zuplo.com/docs/articles/logging). Every log row
carries trace-ready metadata — tenant, MCP session, capability, latency, failure
origin — ready to drop into Datadog, Honeycomb, or BigQuery. Every failure mode
returns a documented problem code so MCP clients know exactly what went wrong
and how to recover. No opaque 500s.

This is the right path if you're federating multiple MCP servers (yours and
third-party), need OAuth 2.0 across all of them, or just don't want to maintain
the analytics pipeline yourself.

### Option B: Instrument your own MCP server route

If you're shipping a single MCP server and prefer to own the analytics pipeline
end to end, the gateway is still where you instrument. Every request flows
through it, so you don't need to modify your backend services or add SDK
instrumentation to each tool implementation.

Here's how to wire it up on an OpenAPI-driven gateway like Zuplo.

#### Per-key attribution

The foundation of MCP analytics is knowing _who_ made each request. Without
per-consumer identity, your metrics are just aggregates.

With Zuplo's
[API Key Authentication](https://zuplo.com/docs/policies/api-key-inbound)
policy, each agent or consumer authenticates with a unique API key. The gateway
resolves the key to a consumer identity, and that identity is available in the
request context for every downstream policy and log entry.

Configure the API key policy on your MCP server route:

```json
{
  "name": "mcp-api-key-auth",
  "policyType": "api-key-inbound",
  "handler": {
    "export": "ApiKeyInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {}
  }
}
```

Once authentication is in place, `request.user.sub` contains the consumer
identity. Every log entry, metric, and trace can be tagged with this value.

#### Log enrichment with custom policies

The real power of MCP analytics comes from parsing the MCP payload and emitting
structured metadata. On a
[programmable gateway](https://zuplo.com/docs/policies/custom-code-inbound), you
can write a TypeScript policy that inspects the incoming JSON-RPC request and
attaches tool-level dimensions to the log context.

Here's an example inbound policy that extracts MCP-specific fields:

```typescript
import type { ZuploContext, ZuploRequest } from "@zuplo/runtime";

export default async function mcpAnalyticsPolicy(
  request: ZuploRequest,
  context: ZuploContext,
) {
  // Clone the request so the body remains available downstream
  const clone = request.clone();
  const body = await clone.json();

  // Extract MCP-specific fields from the JSON-RPC payload
  const mcpMethod = body?.method; // e.g., "tools/call", "tools/list"
  const toolName = body?.params?.name; // e.g., "search_products"
  const agentId = request.user?.sub ?? "anonymous";

  // Attach as structured log properties for all subsequent log entries
  context.log.setLogProperties!({
    mcpMethod,
    mcpToolName: toolName ?? "none",
    agentId,
  });

  context.log.info(
    {
      mcpMethod,
      tool: toolName,
      agent: agentId,
    },
    "MCP tool invocation",
  );

  return request;
}
```

This policy runs before the
[MCP Server Handler](https://zuplo.com/docs/handlers/mcp-server) processes the
request. Every log entry for this request — including those from downstream
policies — will carry the `mcpMethod`, `mcpToolName`, and `agentId` fields.

For outbound enrichment, add a response policy that captures latency and
response size:

```typescript
import type { ZuploContext, ZuploRequest, ZuploResponse } from "@zuplo/runtime";

export default async function mcpResponseAnalytics(
  response: ZuploResponse,
  request: ZuploRequest,
  context: ZuploContext,
) {
  const responseSize = response.headers.get("content-length") ?? "unknown";

  context.log.info(
    {
      status: response.status,
      responseSize,
    },
    "MCP tool response",
  );

  return response;
}
```

#### Exporting to your observability stack

With MCP-specific fields in your logs, you need to get them into your
observability platform. Zuplo supports
[log integrations](https://zuplo.com/docs/articles/logging) with Datadog, New
Relic, Dynatrace, Splunk, Loki, Google Cloud Logging, AWS CloudWatch, and Sumo
Logic. For distributed tracing, the
[OpenTelemetry plugin](https://zuplo.com/docs/articles/opentelemetry) exports
traces with full span instrumentation across the policy pipeline.

The custom log properties you set with `context.log.setLogProperties` are
included in every log entry sent to your configured logging provider. In
Datadog, for example, you can create facets on `mcpToolName` and `agentId` and
build dashboards that slice MCP traffic by those dimensions.

For metrics, Zuplo's
[metrics plugins](https://zuplo.com/docs/articles/metrics-plugins) send request
latency, request content length, and response content length to platforms like
Datadog. Combined with the custom log properties, you can correlate latency
spikes with specific capabilities and agents.

## Alerting and SLOs for agent-driven traffic

Agent traffic behaves differently from human-driven API traffic. Agents don't
gradually ramp up — they burst. They don't read error messages — they retry.
They don't stop at rate limits — they queue and hammer. Your alerting strategy
needs to account for these patterns.

### Per-tool SLOs

Define SLOs at the capability level, not the endpoint level. Suggested starting
points:

- **Lightweight tools** (data lookups, status checks): p95 latency < 500ms,
  99.5% success rate
- **Medium tools** (CRUD operations, simple transformations): p95 latency < 1s,
  99% success rate
- **Heavy tools** (multi-step workflows, external API calls): p95 latency < 3s,
  98% success rate

If you're federating, set independent SLOs for **gateway p95** (your overhead)
and **upstream p95** (the third-party service). The MCP Gateway dashboard
reports both on the same chart, so a regression on one side jumps out
immediately.

### Alert on deviation, not threshold

Static threshold alerts are noisy for MCP traffic because agent usage patterns
shift frequently. Instead, alert on deviation from baseline:

- Tool call volume 3x above its 7-day rolling average
- p95 latency 2x above its trailing 24-hour baseline (either gateway-side or
  upstream-side)
- Error rate for any single capability exceeding 5% over a 15-minute window
- `missing_token` or `invalid_token` reason codes spiking on a tenant that
  previously had clean auth (likely a credential rotation issue)
- Any destructive capability (`delete_*`, `update_*`) called by an unrecognized
  agent

### Rate limiting as enforcement

[Rate limiting](https://zuplo.com/docs/policies/rate-limit-inbound) is the
enforcement counterpart to the metrics you're measuring. Once you have per-tool
and per-agent visibility, you can set intelligent limits:

```json
{
  "name": "mcp-rate-limit",
  "policyType": "rate-limit-inbound",
  "handler": {
    "export": "RateLimitInboundPolicy",
    "module": "$import(@zuplo/runtime)",
    "options": {
      "rateLimitBy": "user",
      "requestsAllowed": 100,
      "timeWindowMinutes": 1
    }
  }
}
```

For more granular control, use a
[custom rate-limiting function](https://zuplo.com/docs/concepts/rate-limiting)
that sets different limits per capability or per consumer tier.

## Worked example: an MCP analytics dashboard

Here's what a practical MCP analytics dashboard looks like — modeled on the
[MCP Gateway](/mcp-gateway) dashboard so the panels mirror real production
analytics, but applicable to any setup.

### Panel 1: MCP events over time (time series)

A stacked area chart showing events per minute, broken down by event type across
the three families: capability invoked, capability completed, capability failed,
capability listed, request received, request completed, request rejected,
downstream token validated, downstream token rejected, initialize negotiated.
This is your primary traffic overview. You should be able to spot immediately
which event types are active and whether any are spiking.

### Panel 2: Latency — gateway vs upstream (split line chart)

A single chart with three series: total p95, gateway p95, upstream p95. When
they move together, your gateway and the upstream are in lock-step; when they
diverge, you can tell at a glance which side regressed. Pair this with a per-
capability heatmap (p50 and p95 by capability over time) for the more granular
view.

### Panel 3: Top users and top clients (tables)

Side-by-side tables for **tenant identity** (who is paying for this traffic) and
**MCP client identity** (Claude Code, Cursor, MCP Inspector, custom). Each row
includes total events and error count, so integration issues surface
immediately. Whether a single tenant is dominating, or a single client (e.g.,
unauthenticated MCP Inspector probes) is generating noise, both surface here.

### Panel 4: Failure analysis (origin + reason code)

A two-level view. The top level breaks failures by origin: client, auth,
upstream, gateway, none. The second level surfaces the specific reason codes
behind them — `missing_token`, `invalid_token`, `expired_token`,
`revoked_token`, `upstream_error`. This is what tells you whose problem each
failure is. For example, a 90% concentration of `missing_token` reason codes on
the client origin typically means agents are integrated but haven't completed
OAuth yet.

### Panel 5: Method distribution and event families (donut charts)

Two donuts side by side. The first is the higher-level **event families**
breakdown (Requests, Capabilities, Auth). The second is the per-MCP-method
breakdown: `tools/call`, `tools/list`, `initialize`, `prompts/list`,
`resources/list`, `notifications/initialized`, `ping`,
`notifications/cancelled`, `logging/setLevel`. A healthy MCP server typically
shows a high ratio of `tools/call` to `tools/list`. If `tools/list` dominates,
agents may be re-discovering tools too frequently — a sign that caching or
session handling needs attention.

### Panel 6: Top capabilities, virtual servers, and upstream servers (tables)

When you're federating, the per-server view is essential: which virtual MCP
servers are getting traffic, and which upstream MCP servers (Linear, GitHub,
Stripe, Notion, internal ones) are answering for them. Three tables solve this:
**Top Capabilities** (per-capability call count, error count, error rate, p95),
**Top Virtual Servers** (the namespaces your customers connect to), and **Top
Upstream Servers** (the third-party MCP servers you're federating). A capability
with a 50% error rate against a low-error upstream usually means a tool wiring
problem; a capability with a low error rate fronting a high-error-rate upstream
tells you the upstream is unstable.

### Panel 7: Cost attribution (table)

A table showing estimated cost per capability per tenant over the selected time
period. Columns: capability name, total calls, average response size, estimated
cost. Sort by cost descending to identify the most expensive capability-tenant
combinations.

### Building this in practice

Every panel above can be built either by routing MCP traffic through the MCP
Gateway (no instrumentation needed) or from the structured logs your gateway
emits if you're rolling your own. If you're using Datadog, create facets on
`mcpToolName`, `agentId`, and `mcpMethod`. In Grafana with Loki, use LogQL label
filters on the same fields. The MCP Gateway emits all of these as structured
metadata by default; if you're instrumenting yourself, your custom inbound
policy emits them.

## Getting started: identity first, then signals

You have two starting points depending on whether you're shipping one MCP server
or federating many.

**If you're federating multiple MCP servers** (yours and third-party), route
them through [MCP Gateway](/mcp-gateway). The dashboard populates with the
signals above as soon as traffic flows. From there, configure the
[log integration](https://zuplo.com/docs/articles/logging) for Datadog, New
Relic, Honeycomb, or BigQuery so the typed events stream into your existing
observability stack.

**If you're shipping a single MCP server** and want full control of the
pipeline, start with the foundation:

1. **Add API key authentication** to your MCP server route so you know who is
   making each call.
2. **Add the MCP analytics inbound policy** to extract capability names and
   method types into structured logs.
3. **Configure a logging plugin** to forward enriched logs to your observability
   platform.
4. **Build your first dashboard** with tool call volume and agent breakdown.
5. **Add latency tracking and cost attribution** once you understand your
   traffic patterns.

The gateway is where these signals converge — either way. It sees every MCP
request before your tools execute, and every response before the agent receives
it. Instrument it once (or let the MCP Gateway instrument it for you), and every
capability you add automatically inherits the same observability.

---

**Ready to add analytics to your MCP setup?**

- [Spin up MCP Gateway](/mcp-gateway) if you're federating multiple MCP servers
  and want analytics out of the box — across three event families, with
  trace-ready structured logs that drop into Datadog, Honeycomb, or BigQuery.
- [Deploy an MCP server](https://zuplo.com/docs/handlers/mcp-server) if you're
  building a single MCP server and want full control of the analytics pipeline.

[Sign up for free](https://portal.zuplo.com/signup) and have your first
instrumented MCP server running in minutes.