AI Gateway

Zuplo AI Gateway

Zuplo's AI Gateway acts as an intelligent proxy layer that sits between your engineering team's applications and LLM providers like OpenAI, Google Gemini, and others. Instead of your applications communicating directly with these providers, all requests flow through the Zuplo AI Gateway, which streams responses while applying policies, controls, and monitoring.

Key Benefits

Provider Independence: Switch between LLM providers (OpenAI, Google Gemini, etc.) dynamically without modifying application code. Configure your provider choice through the gateway rather than hardcoding it into your applications.

Cost Control: Set spending limits at organization, team, and application levels with hierarchical budgets that cascade down through your structure. Configure daily and monthly thresholds with enforcement or warning notifications.

Security & Compliance: Apply guardrails to detect and block prompt injection attempts and prevent PII leakage in both requests and responses through integrated AI firewall policies.

Self-Service Access: Developers can create applications and access LLMs without needing direct access to provider API keys. Administrators configure providers once, and teams consume them securely.

Performance Optimization: Enable semantic caching to identify and return cached responses for similar prompts, reducing costs and improving response times.

Full Observability: Real-time dashboards show request counts, token usage, time-to-first-byte metrics, and spending patterns across your organization.

How It Works

Your applications send requests to the Zuplo AI Gateway URL using your Zuplo API key. The gateway authenticates the request, applies configured policies (cost controls, security guardrails), routes to the selected LLM provider, and streams the response back to your application. Throughout this process, the gateway captures metrics and enforces limits without exposing underlying provider credentials.

Core Features

Multi-Provider Support

Configure multiple LLM providers within a single gateway project. Supported providers include OpenAI (GPT-4, GPT-4.5, and other models) and Google Gemini (all model variants). Select which models are available to your teams when configuring each provider.

Team Hierarchy & Budgets

Organize users into teams with hierarchical structures. Set budget limits at each level that cascade down:

Root Team: Organization-wide limits (e.g., $1,000/day)
Sub-Teams: Team-specific limits that cannot exceed parent limits (e.g., $500/day for the Credit Team)
Applications: Per-app limits for granular control

Application Configuration

Each application gets its own:

Unique Gateway URL: Single endpoint regardless of underlying provider
API Key: Zuplo-managed key that never exposes provider credentials
Model Selection: Choose specific models from configured providers
Budget Thresholds: Daily and monthly limits with enforcement or warnings
Semantic Caching: Optional caching of similar prompts to reduce costs

Use Cases

Multi-tenant AI Applications: Enforce spending limits per customer or team
Agent Development: Build AI agents that can switch providers without code changes
Cost Management: Control and monitor LLM spending across your organization
Security Compliance: Ensure PII and prompt injection protection across all LLM interactions
Performance: Reduce costs and latency with semantic caching for common queries

Edit this page

Last modified on November 10, 2025

Getting Started