Zuplo AI Gateway
Zuplo's AI Gateway acts as an intelligent proxy layer that sits between your engineering team's applications and LLM providers like OpenAI, Google Gemini, and others. Instead of your applications communicating directly with these providers, all requests flow through the Zuplo AI Gateway, which streams responses while applying policies, controls, and monitoring.
Key Benefits
Provider Independence: Switch between LLM providers (OpenAI, Google Gemini, etc.) dynamically without modifying application code. Configure your provider choice through the gateway rather than hardcoding it into your applications.
Cost Control: Set spending limits at organization, team, and application levels with hierarchical budgets that cascade down through your structure. Configure daily and monthly thresholds with enforcement or warning notifications.
Security & Compliance: Apply guardrails to detect and block prompt injection attempts and prevent PII leakage in both requests and responses through integrated AI firewall policies.
Self-Service Access: Developers can create applications and access LLMs without needing direct access to provider API keys. Administrators configure providers once, and teams consume them securely.
Performance Optimization: Enable semantic caching to identify and return cached responses for similar prompts, reducing costs and improving response times.
Full Observability: Real-time dashboards show request counts, token usage, time-to-first-byte metrics, and spending patterns across your organization.
How It Works
Your applications send requests to the Zuplo AI Gateway URL using your Zuplo API key. The gateway authenticates the request, applies configured policies (cost controls, security guardrails), routes to the selected LLM provider, and streams the response back to your application. Throughout this process, the gateway captures metrics and enforces limits without exposing underlying provider credentials.
Core Features
Multi-Provider Support
Configure multiple LLM providers within a single gateway project. Supported providers include OpenAI (GPT-4, GPT-4.5, and other models) and Google Gemini (all model variants). Select which models are available to your teams when configuring each provider.
Team Hierarchy & Budgets
Organize users into teams with hierarchical structures. Set budget limits at each level that cascade down:
- Root Team: Organization-wide limits (e.g., $1,000/day)
- Sub-Teams: Team-specific limits that cannot exceed parent limits (e.g., $500/day for the Credit Team)
- Applications: Per-app limits for granular control
Application Configuration
Each application gets its own:
- Unique Gateway URL: Single endpoint regardless of underlying provider
- API Key: Zuplo-managed key that never exposes provider credentials
- Model Selection: Choose specific models from configured providers
- Budget Thresholds: Daily and monthly limits with enforcement or warnings
- Semantic Caching: Optional caching of similar prompts to reduce costs
Use Cases
- Multi-tenant AI Applications: Enforce spending limits per customer or team
- Agent Development: Build AI agents that can switch providers without code changes
- Cost Management: Control and monitor LLM spending across your organization
- Security Compliance: Ensure PII and prompt injection protection across all LLM interactions
- Performance: Reduce costs and latency with semantic caching for common queries