Usage Limits & Thresholds

The Zuplo AI Gateway provides hierarchical usage limits and budget controls to manage LLM spending across your organization. Limits can be set at the organization, team, and application levels.

Budget Hierarchy

Budget limits cascade down through your organizational structure:

Root Team - Organization-wide limits (for example, $1,000/day)
Sub-Teams - Team-specific limits that cannot exceed the parent team's budget (for example, $500/day for the Engineering team)
Applications - Per-app limits for granular control (for example, $10/day for a hackathon project)

A sub-team's budget can never exceed the available budget from its parent team. Similarly, an application's budget cannot exceed its owning team's budget.

Configuring Limits

Daily Budgets

Set a maximum daily spend for a team or application. When the daily budget is reached, requests are either blocked or flagged with a warning depending on your enforcement configuration.

To configure daily budgets:

Open your AI Gateway project in the Zuplo Portal
Select the Teams or Apps tab
Click on the team or app to edit
Select the Usage & Limits tab and configure the Daily Budget field
Click Save Changes

Monthly Budgets

Set a maximum monthly spend for applications. Monthly budgets reset on the first day of each calendar month.

Rate Limits

In addition to budget-based limits, you can configure request rate limits to control the volume of requests flowing through the gateway.

Enforcement Modes

When a limit is reached, the AI Gateway can operate in two modes:

Enforce - Requests are blocked and an error response is returned to the caller
Warn - Requests are allowed through but a warning notification is generated

Monitoring Usage

Track current usage and spending through the AI Gateway dashboard:

Open the Analytics tab of your AI Gateway project
Click on an app and select Dashboard
View real-time metrics including:
- Request count
- Token usage (input and output)
- Current spending against budget
- Time to first byte

Semantic Caching

Enable semantic caching on applications to reduce costs by identifying and returning cached responses for similar prompts. This can significantly reduce token usage and spending, especially for applications with repeated or similar queries.

To enable semantic caching:

Open the Apps tab and click on the app to edit
Enable the Semantic Caching toggle under Advanced Features
Save your changes

Getting Started - Set up your first AI Gateway project with budget controls
Managing Teams - Configure team-level budgets
Managing Apps - Configure app-level limits

Edit this page

Last modified on May 10, 2026

Budget Hierarchy

Budget limits cascade down through your organizational structure:

Root Team - Organization-wide limits (for example, $1,000/day)

Sub-Teams - Team-specific limits that cannot exceed the parent team's budget (for example, $500/day for the Engineering team)

Applications - Per-app limits for granular control (for example, $10/day for a hackathon project)

A sub-team's budget can never exceed the available budget from its parent team. Similarly, an application's budget cannot exceed its owning team's budget.

Configuring Limits

Daily Budgets

Set a maximum daily spend for a team or application. When the daily budget is reached, requests are either blocked or flagged with a warning depending on your enforcement configuration.

To configure daily budgets:

Open your AI Gateway project in the Zuplo Portal

Select the Teams or Apps tab

Click on the team or app to edit

Select the Usage & Limits tab and configure the Daily Budget field

Click Save Changes

Monthly Budgets

Set a maximum monthly spend for applications. Monthly budgets reset on the first day of each calendar month.

Rate Limits

In addition to budget-based limits, you can configure request rate limits to control the volume of requests flowing through the gateway.

Semantic Caching

To enable semantic caching:

Open the Apps tab and click on the app to edit

Enable the Semantic Caching toggle under Advanced Features

Save your changes

Budget Hierarchy

Configuring Limits

Daily Budgets

Monthly Budgets

Rate Limits

Enforcement Modes

Monitoring Usage

Semantic Caching

Related Resources

Budget Hierarchy

Configuring Limits

Daily Budgets

Monthly Budgets

Rate Limits

Enforcement Modes

Monitoring Usage

Semantic Caching

Related Resources