---
title: "AI Governance for API Teams: Controlling Access, Cost, and Compliance"
description: "Build a governance framework for AI and LLM usage — access controls, cost management, compliance policies, and audit logging with practical API gateway patterns."
canonicalUrl: "https://zuplo.com/learning-center/ai-governance-for-api-teams"
pageType: "learning-center"
authors: "nate"
tags: "AI, API Governance, API Security"
image: "https://zuplo.com/og?text=AI%20Governance%20for%20API%20Teams%3A%20Controlling%20Access%2C%20Cost%2C%20and%20Compliance"
---
AI adoption across the enterprise is accelerating at a pace that governance
frameworks can barely keep up with. Engineering teams are integrating OpenAI,
Anthropic, Google Gemini, and open-source models into everything from customer
support chatbots to code generation pipelines. But while the pace of adoption is
impressive, the controls around that usage often range from informal to
nonexistent.

The result? Uncontrolled costs, unknown data exposure, and compliance gaps that
only surface during audits or incidents. The teams best positioned to close
these gaps are API teams -- because every AI and LLM interaction, whether it's a
call to GPT-4o or an internal fine-tuned model, is ultimately an API call.

This guide walks through building a practical AI governance framework centered
on the API gateway. You'll learn how to enforce access controls, manage costs,
maintain compliance, and create an audit trail -- with concrete patterns and
code examples you can implement today.

## Why API Teams Own AI Governance

Think about the path every AI request takes. A developer's application sends a
prompt. That prompt travels over HTTP to a model endpoint -- OpenAI's
`/v1/chat/completions`, Anthropic's `/v1/messages`, or your own internally
hosted model. The response comes back over the same channel.

This means the API layer is the single chokepoint for all AI traffic in your
organization. The API gateway sits at exactly the right position in the stack to
enforce governance policies uniformly, regardless of which team, application, or
model is involved.

Here's why this matters:

- **Centralized enforcement**: Instead of relying on every team to implement
  their own controls, the gateway applies policies consistently across all AI
  traffic.
- **Separation of concerns**: Application developers focus on building features.
  The platform team handles governance at the infrastructure level.
- **Visibility**: The gateway sees every request and response, making it the
  natural place to log, meter, and audit AI usage.
- **Speed of implementation**: Adding a new policy to a gateway takes minutes.
  Retrofitting controls into dozens of individual applications takes months.

The API gateway is not just the transport layer for AI -- it's the control
plane. And that makes the API team the de facto AI governance team, whether they
signed up for the job or not.

## Building a Governance Framework

A governance framework for AI APIs needs three pillars: clear roles and
policies, well-defined access tiers, and technical enforcement mechanisms that
don't rely on trust alone.

### Roles and Policies

Start by defining who can do what with AI services. This isn't just about
blocking unauthorized access -- it's about creating an approval workflow that
scales as AI adoption grows.

A practical starting point:

- **Platform team**: Owns the AI gateway configuration. Approves new model
  integrations. Defines rate limits, cost caps, and compliance policies.
- **Application teams**: Request access to specific models for specific use
  cases. Operate within the guardrails set by the platform team.
- **Security/compliance team**: Defines data classification rules. Reviews audit
  logs. Signs off on new external AI providers.
- **Finance**: Sets departmental budget caps for AI spend. Reviews usage
  reports.

For new AI service requests, establish a lightweight approval flow. A team wants
to use Claude for summarization? They submit a request specifying the model, the
use case, estimated volume, and the data classification of inputs. The platform
team provisions access with appropriate controls. This doesn't need to be
bureaucratic -- a Slack workflow or a simple form backed by API key provisioning
is enough to start.

### Access Tiers

Not every team needs access to every model. Access tiers let you match model
capabilities (and costs) to actual needs.

A common tiering structure:

| Tier        | Models Available            | Use Cases                           | Rate Limit    |
| ----------- | --------------------------- | ----------------------------------- | ------------- |
| Development | GPT-4o-mini, Claude Haiku   | Prototyping, testing                | 100 req/min   |
| Standard    | GPT-4o, Claude Sonnet       | Production features, internal tools | 500 req/min   |
| Premium     | GPT-4o, Claude Opus, o1-pro | Revenue-critical, complex reasoning | 2,000 req/min |
| Restricted  | Fine-tuned internal models  | Sensitive data processing           | Custom        |

Each tier maps to an API key or JWT claim that the gateway uses to enforce
routing and limits. Teams start in the Development tier and move up through the
approval process.

### JWT-Claim Routing

When your organization uses JWT-based authentication, you can embed the access
tier directly in the token claims. The gateway then routes requests to the
appropriate model endpoint without any application-level logic.

Here's a Zuplo inbound policy that reads the tier from a JWT claim and routes
accordingly:

```typescript
import { ZuploContext, ZuploRequest } from "@zuplo/runtime";

const MODEL_ROUTES: Record<string, string> = {
  development: "https://api.openai.com/v1/chat/completions", // routes to mini via body rewrite
  standard: "https://api.openai.com/v1/chat/completions",
  premium: "https://api.anthropic.com/v1/messages",
  restricted: "https://internal-models.company.com/v1/completions",
};

const TIER_MODELS: Record<string, string[]> = {
  development: ["gpt-4o-mini", "claude-3-haiku-20240307"],
  standard: ["gpt-4o", "claude-sonnet-4-20250514"],
  premium: ["gpt-4o", "claude-opus-4-20250514", "o1-pro"],
  restricted: ["internal-summarizer-v2", "internal-classifier-v1"],
};

export default async function aiTierRouting(
  request: ZuploRequest,
  context: ZuploContext,
) {
  const tier = request.user?.data?.tier as string;

  if (!tier || !MODEL_ROUTES[tier]) {
    return new Response(
      JSON.stringify({ error: "Invalid or missing AI access tier" }),
      { status: 403, headers: { "Content-Type": "application/json" } },
    );
  }

  // Validate the requested model is allowed for this tier
  const body = await request.json();
  const requestedModel = body.model;

  if (requestedModel && !TIER_MODELS[tier].includes(requestedModel)) {
    return new Response(
      JSON.stringify({
        error: `Model '${requestedModel}' is not available in the '${tier}' tier`,
        allowed_models: TIER_MODELS[tier],
      }),
      { status: 403, headers: { "Content-Type": "application/json" } },
    );
  }

  // Set the upstream URL based on tier
  context.custom.upstreamUrl = MODEL_ROUTES[tier];

  return request;
}
```

This pattern keeps routing logic out of application code entirely. Developers
send requests to a single AI gateway endpoint. The gateway reads their token,
checks their tier, validates the requested model, and routes accordingly. If a
developer tries to access a model above their tier, they get a clear error
telling them which models they can use.

## Cost Controls

AI API costs can escalate quickly. A single runaway process calling GPT-4o in a
loop can burn through thousands of dollars in hours. Effective cost controls
require multiple layers: per-team quotas, smart caching, model tiering, and
usage tracking.

### Per-Team Quotas

Rate limiting is the first line of defense, but for AI governance you need more
than simple requests-per-minute limits. You need quotas that map to business
units and budgets.

With Zuplo, you can configure rate limiting per API key, which maps directly to
teams or applications:

```json
{
  "policies": [
    {
      "name": "ai-rate-limit",
      "policyType": "rate-limit-inbound",
      "handler": {
        "export": "default",
        "module": "$import(@zuplo/runtime)",
        "options": {
          "rateLimitBy": "user",
          "requestsAllowed": 10000,
          "timeWindowMinutes": 1440,
          "identifier": {
            "func": "$import(./modules/rate-limit-id)",
            "export": "rateLimitId"
          }
        }
      }
    }
  ]
}
```

The `rateLimitBy: "user"` configuration ensures each API consumer gets their own
quota bucket. Set `requestsAllowed` to the daily limit appropriate for each
tier, and the gateway enforces it automatically.

For monthly spend caps, you need to track cumulative token usage. More on that
in the token-based billing section below.

### Semantic Caching

If multiple users or applications send identical (or near-identical) prompts to
the same model, you're paying for the same computation repeatedly. Semantic
caching intercepts these duplicate requests and serves the cached response
instead.

The concept works like this:

1. A request comes in with a prompt.
2. The gateway computes a hash of the prompt (and relevant parameters like
   model, temperature, and system prompt).
3. If a cached response exists for that hash, return it immediately.
4. If not, forward the request to the model, cache the response, and return it.

This is especially effective for common operations like classification,
extraction from templates, and FAQ-style queries where the same questions recur
frequently. In practice, organizations see 15-40% cache hit rates on AI traffic,
translating directly to cost savings.

For prompts that aren't identical but semantically similar, you can use
embedding similarity to match against cached responses. This adds complexity but
can significantly increase hit rates for use cases like customer support where
the same question gets phrased many different ways.

### Model Tiering for Cost Optimization

Not every request needs the most capable (and expensive) model. A request
classifier at the gateway level can route simple requests to cheaper models
automatically.

Consider this pattern:

- **Simple lookups and classifications**: Route to GPT-4o-mini or Claude Haiku.
  These models handle straightforward tasks at a fraction of the cost.
- **Standard generation and summarization**: Route to GPT-4o or Claude Sonnet.
  Good balance of quality and cost.
- **Complex reasoning and analysis**: Route to Claude Opus, o1-pro, or
  specialized models. Reserve these for tasks that genuinely need them.

You can implement this as a gateway policy that inspects request metadata -- a
custom header like `X-AI-Priority` or a field in the request body -- and routes
accordingly:

```typescript
import { ZuploContext, ZuploRequest } from "@zuplo/runtime";

const PRIORITY_MODEL_MAP: Record<string, string> = {
  low: "gpt-4o-mini",
  standard: "gpt-4o",
  high: "claude-opus-4-20250514",
};

export default async function modelTiering(
  request: ZuploRequest,
  context: ZuploContext,
) {
  const priority = request.headers.get("x-ai-priority") ?? "standard";
  const targetModel = PRIORITY_MODEL_MAP[priority];

  if (!targetModel) {
    return new Response(
      JSON.stringify({
        error: `Invalid priority '${priority}'. Use: low, standard, high`,
      }),
      { status: 400, headers: { "Content-Type": "application/json" } },
    );
  }

  const body = await request.json();
  body.model = targetModel;

  return new Request(request.url, {
    method: request.method,
    headers: request.headers,
    body: JSON.stringify(body),
  });
}
```

Application teams set the priority based on the use case. The gateway handles
the rest. This approach can reduce AI spend by 30-60% for organizations with a
mix of simple and complex AI workloads.

### Token-Based Billing and Metering

To enforce monthly spend caps and provide accurate usage reporting, you need to
track token consumption per consumer. Zuplo's metering capabilities let you log
token usage alongside standard request metrics.

Here's an outbound policy that extracts token usage from the AI provider's
response and records it:

```typescript
import { ZuploContext, ZuploRequest, ZuploResponse } from "@zuplo/runtime";

interface TokenUsage {
  prompt_tokens: number;
  completion_tokens: number;
  total_tokens: number;
}

// Cost per 1K tokens (example rates)
const MODEL_COSTS: Record<string, { input: number; output: number }> = {
  "gpt-4o": { input: 0.0025, output: 0.01 },
  "gpt-4o-mini": { input: 0.00015, output: 0.0006 },
  "claude-opus-4-20250514": { input: 0.015, output: 0.075 },
  "claude-sonnet-4-20250514": { input: 0.003, output: 0.015 },
};

export default async function trackTokenUsage(
  response: ZuploResponse,
  request: ZuploRequest,
  context: ZuploContext,
) {
  try {
    const body = await response.json();
    const usage: TokenUsage = body.usage;
    const model: string = body.model;

    if (usage && model) {
      const costs = MODEL_COSTS[model] ?? { input: 0, output: 0 };
      const estimatedCost =
        (usage.prompt_tokens / 1000) * costs.input +
        (usage.completion_tokens / 1000) * costs.output;

      // Log to Zuplo analytics for metering and billing
      context.log.info("AI token usage", {
        consumer: request.user?.sub,
        team: request.user?.data?.team,
        model,
        promptTokens: usage.prompt_tokens,
        completionTokens: usage.completion_tokens,
        totalTokens: usage.total_tokens,
        estimatedCost: estimatedCost.toFixed(6),
      });
    }

    // Return the response unchanged
    return new Response(JSON.stringify(body), {
      status: response.status,
      headers: response.headers,
    });
  } catch {
    // If we can't parse the response, pass it through unchanged
    return response;
  }
}
```

This data feeds into dashboards and alerting. When a team approaches their
monthly budget, you can trigger warnings. When they hit the cap, the rate
limiter kicks in. Finance gets a clear report of AI spend by team, model, and
application.

### Spend Limit Enforcement

Combining token tracking with spend limits creates a hard cap on AI costs.
Here's an inbound policy that checks cumulative spend before allowing a request:

```typescript
import { ZuploContext, ZuploRequest } from "@zuplo/runtime";

interface SpendRecord {
  currentSpend: number;
  limit: number;
  periodStart: string;
}

export default async function enforceSpendLimit(
  request: ZuploRequest,
  context: ZuploContext,
) {
  const team = request.user?.data?.team as string;

  if (!team) {
    return new Response(
      JSON.stringify({ error: "Team identification required" }),
      { status: 403, headers: { "Content-Type": "application/json" } },
    );
  }

  // Retrieve current spend from your tracking store
  const spendRecord = await getTeamSpend(team, context);

  if (spendRecord.currentSpend >= spendRecord.limit) {
    return new Response(
      JSON.stringify({
        error: "Monthly AI spend limit reached",
        current_spend: `$${spendRecord.currentSpend.toFixed(2)}`,
        limit: `$${spendRecord.limit.toFixed(2)}`,
        resets: spendRecord.periodStart,
        action: "Contact your platform team to request a limit increase",
      }),
      { status: 429, headers: { "Content-Type": "application/json" } },
    );
  }

  // Allow the request to proceed
  return request;
}

async function getTeamSpend(
  team: string,
  context: ZuploContext,
): Promise<SpendRecord> {
  // Implementation depends on your storage backend
  // Could be a KV store, database, or external billing API
  const response = await fetch(
    `https://billing-api.internal/teams/${team}/ai-spend`,
    {
      headers: { Authorization: `Bearer ${context.custom.billingApiKey}` },
    },
  );
  return response.json() as Promise<SpendRecord>;
}
```

This gives teams a clear, predictable boundary. No surprises on the monthly AI
bill.

## Compliance and Audit

Cost controls protect your budget. Compliance controls protect your business.
For organizations in regulated industries -- or any company handling customer
data -- AI governance requires robust auditing, data protection, and residency
controls.

### Audit Logging

Every AI request should be logged with enough context to answer these questions
during an audit:

- **Who** made the request? (User identity, team, application)
- **What** model was called, with what parameters?
- **When** did the request occur?
- **How many** tokens were consumed?
- **What** was the response status?

Here's a Zuplo policy that creates comprehensive audit logs:

```typescript
import { ZuploContext, ZuploRequest, ZuploResponse } from "@zuplo/runtime";

export default async function auditLog(
  response: ZuploResponse,
  request: ZuploRequest,
  context: ZuploContext,
) {
  const auditEntry = {
    timestamp: new Date().toISOString(),
    requestId: context.requestId,
    // Identity
    userId: request.user?.sub,
    team: request.user?.data?.team,
    apiKeyId: request.user?.data?.apiKeyId,
    // Request details
    model: context.custom.requestedModel,
    endpoint: request.url,
    method: request.method,
    sourceIp: request.headers.get("x-forwarded-for"),
    userAgent: request.headers.get("user-agent"),
    // Response details
    statusCode: response.status,
    tokenUsage: context.custom.tokenUsage,
    estimatedCost: context.custom.estimatedCost,
    // Compliance metadata
    dataClassification: request.headers.get("x-data-classification"),
    region: context.custom.routedRegion,
  };

  // Send to your audit log destination
  context.log.info("AI_AUDIT", auditEntry);

  return response;
}
```

Ship these logs to your SIEM (Splunk, Datadog, etc.) or a dedicated audit store.
The key is making them immutable and queryable. When the compliance team asks
"which teams used GPT-4o to process customer data last quarter?", you should be
able to answer in minutes, not weeks.

### PII Safeguards

One of the biggest risks with external AI services is inadvertently sending
personally identifiable information (PII) to a third-party provider. An inbound
policy can scan request payloads before they leave your network:

```typescript
import { ZuploContext, ZuploRequest } from "@zuplo/runtime";

// Common PII patterns
const PII_PATTERNS: Array<{ name: string; pattern: RegExp }> = [
  {
    name: "SSN",
    pattern: /\b\d{3}-\d{2}-\d{4}\b/g,
  },
  {
    name: "Credit Card",
    pattern: /\b(?:\d{4}[- ]?){3}\d{4}\b/g,
  },
  {
    name: "Email Address",
    pattern: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
  },
  {
    name: "Phone Number",
    pattern: /\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b/g,
  },
];

export default async function piiScanPolicy(
  request: ZuploRequest,
  context: ZuploContext,
) {
  const body = await request.text();
  const detectedPii: string[] = [];

  for (const { name, pattern } of PII_PATTERNS) {
    if (pattern.test(body)) {
      detectedPii.push(name);
    }
    // Reset regex lastIndex after test
    pattern.lastIndex = 0;
  }

  if (detectedPii.length > 0) {
    context.log.warn("PII detected in AI request", {
      userId: request.user?.sub,
      piiTypes: detectedPii,
      requestId: context.requestId,
    });

    const action = request.headers.get("x-pii-action") ?? "block";

    if (action === "block") {
      return new Response(
        JSON.stringify({
          error: "Request blocked: potential PII detected",
          detected_types: detectedPii,
          action:
            "Remove PII from your prompt or use the x-pii-action: warn header to proceed",
        }),
        { status: 422, headers: { "Content-Type": "application/json" } },
      );
    }
    // If action is 'warn', log but allow the request through
  }

  return request;
}
```

For production deployments, you'll want more sophisticated PII detection --
potentially using a dedicated NLP model or a service like Microsoft Presidio.
The regex-based approach above catches the most common patterns and serves as a
first layer of defense.

You can also implement PII redaction instead of blocking, replacing detected PII
with placeholders before the request reaches the model. This lets the request
proceed while protecting sensitive data.

### Data Residency

For organizations operating across regions, data residency requirements dictate
where AI processing can happen. European customer data might need to stay within
the EU. Healthcare data might need to remain in specific jurisdictions.

The gateway can enforce this by routing requests to region-specific model
endpoints based on the user's location or data classification:

```typescript
import { ZuploContext, ZuploRequest } from "@zuplo/runtime";

const REGION_ENDPOINTS: Record<string, string> = {
  eu: "https://eu.openai.azure.com/openai/deployments/gpt-4o/chat/completions",
  us: "https://us.openai.azure.com/openai/deployments/gpt-4o/chat/completions",
  apac: "https://apac.openai.azure.com/openai/deployments/gpt-4o/chat/completions",
};

export default async function dataResidencyRouting(
  request: ZuploRequest,
  context: ZuploContext,
) {
  // Determine region from user claims or request metadata
  const region =
    (request.user?.data?.region as string) ??
    request.headers.get("x-data-region") ??
    "us";

  const endpoint = REGION_ENDPOINTS[region];

  if (!endpoint) {
    return new Response(
      JSON.stringify({ error: `Unsupported region: ${region}` }),
      { status: 400, headers: { "Content-Type": "application/json" } },
    );
  }

  context.custom.routedRegion = region;
  context.custom.upstreamUrl = endpoint;

  return request;
}
```

This approach works well with Azure OpenAI Service deployments, which let you
host models in specific Azure regions. For other providers, you may need to
maintain separate accounts or use provider-specific regional endpoints.

### Retention Policies

AI request and response logs can contain sensitive information -- the prompts
themselves, the generated content, and metadata about usage patterns. Define
clear retention policies:

- **Audit metadata** (who, when, which model, token count): Retain for 12-24
  months for compliance. This data is small and doesn't contain sensitive
  content.
- **Full request/response payloads**: Retain for 30-90 days for debugging and
  quality monitoring. Auto-delete after the retention period.
- **PII-flagged requests**: Either don't log the payload at all, or encrypt it
  with a key that gets rotated on a schedule.

Configure your logging pipeline to separate these tiers. The audit metadata goes
to your long-term compliance store. Full payloads go to a time-limited store
with automatic expiry. This balances operational needs with data minimization
principles.

## Practical Implementation with Zuplo

Zuplo's AI Gateway brings these governance patterns together in a single
platform. Here's how the pieces fit.

### JWT Validation and Claim-Based Routing

Zuplo's built-in JWT authentication policy validates tokens and extracts claims
automatically. Combine it with the tier-routing policy from earlier:

```json
{
  "policies": [
    {
      "name": "jwt-auth",
      "policyType": "open-id-jwt-auth-inbound",
      "handler": {
        "export": "default",
        "module": "$import(@zuplo/runtime)",
        "options": {
          "issuer": "https://auth.yourcompany.com/",
          "audience": "ai-gateway",
          "jwksUrl": "https://auth.yourcompany.com/.well-known/jwks.json"
        }
      }
    },
    {
      "name": "ai-tier-routing",
      "policyType": "custom-code-inbound",
      "handler": {
        "export": "default",
        "module": "$import(./modules/ai-tier-routing)"
      }
    }
  ]
}
```

The JWT policy runs first, validating the token and populating `request.user`
with the token claims. The tier-routing policy then reads the `tier` claim and
routes the request to the appropriate model endpoint.

### Rate Limiting Per API Key

For teams that use API keys instead of JWTs, Zuplo's API key authentication
gives you per-consumer rate limiting out of the box. Each API key can be
assigned metadata (team, tier, spend limit) that your policies can reference.

The rate limiting policy applies per-key quotas automatically. You can set
different limits for different keys through the Zuplo Developer Portal, where
consumers self-serve their API keys and you control the access parameters.

### Custom Logging for Audit Trail

Chain the audit logging policy as an outbound handler on your AI routes. Every
request that passes through gets logged with full context:

```json
{
  "routes": [
    {
      "path": "/v1/ai/completions",
      "methods": ["POST"],
      "handler": {
        "export": "default",
        "module": "$import(@zuplo/runtime)",
        "options": {
          "url": "https://api.openai.com/v1/chat/completions"
        }
      },
      "policies": {
        "inbound": ["jwt-auth", "ai-tier-routing", "pii-scan", "spend-limit"],
        "outbound": ["token-tracking", "audit-log"]
      }
    }
  ]
}
```

Notice the policy chain: inbound policies handle authentication, routing, PII
scanning, and spend limits. Outbound policies capture token usage and write the
audit log. This layered approach means each policy does one thing well, and you
can mix and match them across different AI routes.

### Bringing It All Together

A complete AI governance setup in Zuplo looks like this:

1. **Authentication**: JWT or API key validation on every request.
2. **Authorization**: Tier-based access control using token claims or key
   metadata.
3. **PII protection**: Inbound scan for sensitive data before it leaves your
   network.
4. **Cost controls**: Rate limiting, spend caps, and model tiering to keep costs
   predictable.
5. **Audit trail**: Comprehensive logging of every request with identity, model,
   tokens, and cost.
6. **Data residency**: Region-based routing for compliance with local
   regulations.

Each layer is a separate policy, configured declaratively and applied at the
gateway level. No changes to application code. No reliance on developers
remembering to implement controls. The governance is built into the
infrastructure.

## Governance Checklist

Before going to production with AI APIs, make sure you've addressed each of
these items:

**Access Controls**

- Every AI API consumer is authenticated (JWT or API key)
- Access tiers are defined and mapped to specific models
- An approval workflow exists for new AI service access requests
- Unused API keys and access grants are reviewed and revoked quarterly

**Cost Management**

- Per-team or per-application rate limits are configured
- Monthly spend caps are set and enforced at the gateway
- Semantic caching is enabled for high-volume, repetitive workloads
- Model tiering routes low-priority requests to cost-effective models
- Token usage is tracked and reported per consumer

**Compliance and Data Protection**

- PII scanning is enabled on inbound requests to external AI providers
- Data residency requirements are mapped to regional model endpoints
- Audit logs capture identity, model, tokens, cost, and timestamp for every
  request
- Audit logs are shipped to an immutable store with appropriate retention
  policies
- Full request/response payload logging has a defined retention period with
  auto-expiry

**Operational Readiness**

- Alerting is configured for spend anomalies and rate limit breaches
- A runbook exists for responding to AI-related incidents (data exposure, cost
  spikes)
- The governance configuration is version-controlled and reviewed through the
  same process as application code
- Dashboards show real-time AI usage by team, model, and cost

**Organizational**

- Roles and responsibilities for AI governance are documented
- The platform team has the authority to enforce controls at the gateway level
- Application teams understand the available tiers and how to request changes
- Finance receives regular reports on AI spend by department

This checklist is a starting point. Your organization may have additional
requirements based on your industry, regulatory environment, and risk tolerance.
The important thing is that governance is explicit, enforced at the
infrastructure level, and not left to individual teams to implement on their
own.

## Get Started with Zuplo's AI Gateway

AI governance doesn't have to be a bottleneck. With the right architecture -- an
API gateway as the enforcement point, clear policies, and layered controls --
you can give teams the AI capabilities they need while maintaining the
visibility and control your organization requires.

Zuplo's AI Gateway gives you the building blocks: JWT and API key
authentication, programmable policies in TypeScript, per-consumer rate limiting,
and comprehensive logging. You can start with basic access controls and add
layers as your AI usage matures.

[Sign up for Zuplo](https://portal.zuplo.com/signup) and deploy your first AI
governance policy in minutes. Your finance team and compliance team will both
thank you.