Is LiteLLM really free?

The MIT-licensed open-source proxy is free to use and modify. However, running it in production requires infrastructure you provision and maintain — compute for the proxy server, a PostgreSQL database, and a Redis instance — plus ongoing engineering time for scaling, patching, and monitoring. Enterprise features like SSO, audit logs, and advanced RBAC require a commercial license from BerriAI. Self-hosted proxy deployments commonly run $500–$2,000 per month in infrastructure costs before accounting for engineering hours.

Can I migrate from LiteLLM without changing my application code?

Yes. Zuplo's AI Gateway exposes an OpenAI-compatible universal API. If your application already uses the OpenAI SDK, LangChain, AI SDK, or any OpenAI-compatible client pointed at LiteLLM, you swap the base URL to your Zuplo gateway and replace the virtual key with a Zuplo API key. The request and response format stays the same.

Does Zuplo support as many LLM providers as LiteLLM?

LiteLLM supports 100+ LLM providers, including self-hosted models via vLLM and Ollama — provider breadth is its core strength. Zuplo focuses on the enterprise short list — OpenAI, Anthropic, Google, Mistral, and xAI — with deeper governance, budget enforcement, and policy coverage for each. If your team uses primarily these providers, Zuplo covers your needs. If you need exotic providers or self-hosted open-source models, LiteLLM has broader coverage.

How does Zuplo handle cost controls differently?

LiteLLM tracks spend per virtual key and per team in PostgreSQL, with budget limits that block requests when exceeded. Zuplo adds hierarchical enforcement — budgets cascade from organization to team to sub-team to application, with daily and monthly thresholds. A sub-team's budget can never exceed its parent's ceiling. Hard 429 responses fire before the bill arrives. Per-team attribution shows exactly which workload is driving cost, all without maintaining a database.

What about semantic caching?

Both platforms support semantic caching via vector similarity. The difference is operational — LiteLLM requires you to provision and maintain a Redis or Qdrant vector store plus an embedding model alongside your existing proxy infrastructure. Zuplo's semantic cache is a managed feature with no additional infrastructure to operate. You configure the similarity tolerance, TTL, and namespace in the policy settings.

Does Zuplo support self-hosted deployment?

Yes. Zuplo offers managed multi-tenant deployment on 300+ edge locations by default, managed dedicated single-tenant deployment on AWS, Azure, GCP, or Akamai, and fully self-hosted deployment on Kubernetes. Enterprise customers can pin regions for data residency or run dedicated deployments in their own cloud account.

Can I use Zuplo for my REST APIs and MCP servers too?

Yes. The AI Gateway runs on the same Zuplo platform that handles REST API management and MCP server governance. One TypeScript policy engine, one authentication model, one audit log, one bill. LiteLLM proxies LLM traffic and now offers an MCP gateway, but it does not do REST API management or generate a developer portal — so you would still run a separate API gateway and portal alongside it.

API Gateway Comparisons

Zuplo vs
LiteLLM

SOC 2 Type II
99.999% SLA
300+ edge locations

Managed AI Gateway vs Open-Source LLM Proxy

Talk to an Architect Compare with AI

Feature

Zuplo

LiteLLM

Deployment and Operations

Cost and Budget Controls

Semantic Caching

Guardrails and Prompt-Injection Protection

Authentication and Access Control

Compliance Posture

Read the Case Study

What's wrong with LiteLLM

LiteLLM's key limitations for modern engineering teams

The forces driving enterprises off LiteLLM in 2026 — operational tax, plugin sprawl, retrofitted AI, and pricing that doesn't predict.

Self-Hosting Is the Product

LiteLLM is open-source software you deploy, scale, and operate yourself. BerriAI offers managed cloud deployment as part of their Enterprise tier, but the open-source version — where most teams start — requires you to provision a proxy server, a PostgreSQL database for spend logs and API keys, and a Redis instance for caching and rate-limit counters. Your platform team becomes the gateway operator.

Python GIL Bottleneck at Scale

LiteLLM's proxy is written in Python and subject to the Global Interpreter Lock. Teams report performance degradation, memory leaks, and increased latency under sustained high-throughput loads. Scaling out means running and coordinating multiple proxy replicas behind your own load balancer.

Enterprise Governance Behind a Paywall

SSO (Okta, Azure AD), JWT authentication, audit logs, and advanced RBAC are LiteLLM Enterprise features — not included in the MIT-licensed open- source release. Teams that need governance for production LLM traffic must either pay for Enterprise or build these controls themselves.

No Unified API and AI Control Plane

LiteLLM has added an MCP gateway, but it does not manage your REST APIs or generate a developer portal — it remains an LLM and MCP proxy, not a full API management platform. Teams end up running the self-hosted LiteLLM proxy alongside a separate API gateway and a separate developer portal, with no shared TypeScript policy engine across REST, LLM, and MCP traffic.

Why Zuplo

Built for teams replatforming off LiteLLM

Managed, modern API management with predictable economics across procurement cycles — no operator overhead, no plugin sprawl, no consumption-pricing surprises.

Deployment and Operations

Fully managed edge deployment vs. self-hosted Python proxy (managed cloud available on LiteLLM Enterprise).

Cost and Budget Controls

Managed hierarchical budgets with hard enforcement vs. self-hosted budget tracking you maintain.

Semantic Caching

Managed semantic caching with zero infrastructure vs. self-hosted semantic caching requiring a vector store and embedding model.

A solutions architect can walk you through your current LiteLLM setup, surface the biggest operational tax, and map a migration path — no slide deck required.

Talk to an Architect Evaluate your migration

Enterprise ready

Production-ready for regulated and high-volume workloads

Compliance & Audit

SOC 2 Type II audited annually
Third-party penetration test reports available under NDA
GDPR-aligned data processing
Audit logs across the control plane
API governance with policy enforcement

Identity & Access

SAML SSO and SCIM provisioning
Role-based access control across organizations, projects, and environments
Service-account credentials with scoped permissions
API key metadata for downstream authorization

Deployment Flexibility

Managed edge across 300+ locations — global by default
Managed dedicated single-tenant on AWS, Azure, GCP, Akamai, or any major cloud
Self-hosted on Kubernetes with full control plane
Bring-your-own-cloud for data residency requirements

Support & Success

Up to 30-minute response SLA on Enterprise
24/7/365 emergency hotline for critical incidents
Named technical account manager
Architecture and migration professional services

Built for the AI era

A managed AI gateway, not just an LLM proxy

LiteLLM is an LLM routing proxy that translates provider-specific API formats into a single OpenAI-compatible interface, and it now includes an MCP gateway for governing MCP tool access by key, team, and org. What it does not do is REST API management or developer portal generation — so teams still run the self-hosted LiteLLM proxy alongside a separate API gateway and portal, with no single TypeScript policy engine, authentication model, or observability pipeline spanning REST, LLM, and MCP traffic.

Unified control plane for REST, LLM, and MCP

The AI Gateway runs on the same TypeScript policy engine as your REST APIs and MCP servers. One set of authentication policies, one rate- limiting configuration, one audit log stream. When your AI product graduates from prototype to production, the gateway already knows how to run it.

Hierarchical budget enforcement

Set dollar budgets at the organization, team, sub-team, and application level. Budgets cascade — a sub-team cannot exceed its parent's ceiling. When a budget is hit, requests return a 429 before the bill arrives, not a soft alert after. Per-team attribution shows exactly which workload is driving cost.

Managed semantic caching with zero infrastructure

Both Zuplo and LiteLLM support semantic caching via vector similarity. The difference is operational — Zuplo's semantic cache is fully managed with configurable similarity tolerance, TTL, and namespace isolation. LiteLLM requires you to provision and maintain a separate vector store and embedding model alongside the proxy.

Observability without instrumentation code

Galileo Tracing and Comet Opik Tracing policies capture hierarchical traces of every AI Gateway request — prompts, parameters, token usage, latency, cost — and ship them to your observability platform without adding a single line of instrumentation to your application code.

See it in action

See Zuplo running on your stack

A 30-minute working session with a Zuplo solutions engineer. Bring an OpenAPI spec or a Kong route definition and walk away with a working preview.

Talk to Sales Compare features

Side by side

Feature-by-feature comparison

Feature

Zuplo

LiteLLM

Deployment and Operations

Fully managed — multi-tenant on 300+ global edge locations by default, managed dedicated single-tenant on AWS, Azure, GCP, or Akamai, and self-hosted on Kubernetes. No proxy server, database, or cache layer to operate. Zuplo handles scaling, uptime, and patching.

Self-hosted by default. You deploy the Python proxy server via Docker or pip, provision and maintain a PostgreSQL database and a Redis instance, and handle scaling, failover, and upgrades yourself. BerriAI offers managed cloud deployment as part of their Enterprise tier.

OpenAI-Compatible Universal API

OpenAI-compatible universal API endpoint. Change the base URL in your existing OpenAI SDK client and requests route through Zuplo with all policies applied. Works with the OpenAI Node.js SDK, Python SDK, LangChain, AI SDK, Claude Code, Cursor, Codex, and Goose.

OpenAI-compatible proxy that translates requests for 100+ LLM providers into a unified format. The same base-URL swap works for any OpenAI-compatible client.

LLM Provider Coverage

Focused on the enterprise short list — OpenAI, Anthropic, Google, Mistral, and xAI — with depth of governance and policy coverage for each. Provider routing, model selection, and fallbacks configured as TypeScript policies.

Broadest open-source provider coverage — 100+ LLM providers including OpenAI, Anthropic, Google Vertex AI, AWS Bedrock, Azure OpenAI, Cohere, Mistral, plus self-hosted models via vLLM, Ollama, HuggingFace TGI, and many more. Provider support is a core strength.

Cost and Budget Controls

Hierarchical dollar budgets at the organization, team, sub-team, and application level. Daily and monthly limits with hard enforcement — requests return 429 when the budget is hit. Budgets cascade so a sub-team can never exceed its parent's ceiling. Per-team cost attribution in the dashboard.

Per-team and per-key budget tracking with automatic spend logging across all providers. Budgets enforce limits and block requests when exceeded. Cost tracking is a first-class feature, but you are responsible for the infrastructure (PostgreSQL) that stores the spend data.

Semantic Caching

Semantic cache uses LLM embeddings to match requests by vector similarity, not exact text. Configurable similarity tolerance (0–1 scale), TTL, namespace isolation, and cache-status headers. Reduces cost and latency on semantically similar prompts without any additional infrastructure.

Supports exact-match and semantic caching modes. Semantic caching requires provisioning additional infrastructure — a Redis or Qdrant vector store plus an embedding model — alongside the existing proxy stack. Teams must configure and maintain the vector search layer themselves.

Guardrails and Prompt-Injection Protection

Prompt-injection detection policy blocks malicious instructions before they reach the model. Secret-masking policy redacts API keys, tokens, and private keys from responses. Data Loss Prevention policy handles PII and broader sensitive data patterns. Akamai AI Firewall partnership adds an additional detection layer. All policies are composable and configurable via the TypeScript policy pipeline.

Built-in guardrails system with in-memory prompt-injection detection, Presidio-based PII masking, and integrations with third-party providers like Aporia, Lakera, Akto, PromptGuard, and Pangea. Guardrails run as configurable hooks in the proxy pipeline. The guardrail infrastructure runs on your self-hosted stack.

Observability and Tracing

First-class Galileo Tracing and Comet Opik Tracing policies capture hierarchical traces (trace, workflow span, LLM span) of every request — prompts, model parameters, token usage, latency, cost — and stream them to your observability platform. No instrumentation code required. Custom collectors supported via the same pipeline.

Built-in callbacks for Langfuse, Helicone, Lunary, OpenTelemetry, and others. Spend logging to PostgreSQL. The admin UI provides cost and usage dashboards. Observability integrations are a strength, though you operate the infrastructure they run on.

Authentication and Access Control

Managed API keys with consumer identity, per-key analytics, GitHub secret- scanning leak detection, and instant revocation. First-class OIDC and Auth0 integration with JWT claim-based authorization at the gateway edge. SAML SSO available as an Enterprise add-on.

Virtual keys with per-key budget and rate-limit assignment via the admin UI. SSO (Okta, Azure AD), JWT authentication, and advanced RBAC are Enterprise-only features, not included in the open-source release.

Compliance Posture

SOC 2 Type II audited annually, GDPR-aligned data processing, annual third-party penetration tests, audit logs across the control plane. Compliance is included at the platform level, not gated by tier.

Open-source software with no vendor-provided compliance certification. SOC 2, GDPR, and audit-log capabilities are available only in the Enterprise tier. Self-hosted deployments inherit your own infrastructure's compliance posture.

Programmable Policies

Every policy is TypeScript code running on the Zuplo runtime. Pre- and post-request hooks at every stage of the request lifecycle. Full npm ecosystem access, type safety, and real CI tests. Custom auth, custom guardrails, custom routing — not limited to a configuration file.

Python-based customization via callback hooks and guardrail functions. Full access to the Python ecosystem for extending proxy behavior. Custom routing, custom logging, and custom guardrails are possible by modifying the proxy configuration or source code.

GitOps and CI/CD

Git is the source of truth. Every push deploys, every PR gets a live preview environment. Branches, environments, and rollbacks for gateway configuration. Same deployment model as the rest of the Zuplo platform.

Configuration lives in YAML files that can be version-controlled. Deployment is your responsibility — Docker builds, Kubernetes manifests, or manual restarts. No built-in preview environments or GitOps deployment pipeline.

Open-Source Footprint

@zuplo/mcp open-source MCP client and server primitives, MIT licensed. Zuplo runtime and policies are source-available with TypeScript escape hatches. The managed platform is the primary product.

Core proxy server is MIT-licensed with 52,000+ GitHub stars and 240M+ Docker pulls. One of the most widely adopted open-source LLM infrastructure projects. Enterprise features are under a separate commercial license.

Migration path

Moving from LiteLLM to Zuplo AI Gateway

If your applications already call LiteLLM's OpenAI-compatible proxy, the migration is a base-URL swap. Point your OpenAI SDK, LangChain, or AI SDK client at your Zuplo gateway URL, replace the LiteLLM virtual key with a Zuplo API key, and every request flows through managed policies — budgets, caching, guardrails, tracing — without changing your application code.

Phase

Duration · weeks

W2W4W6W8W10

Outcome

Create your AI Gateway project

Sign up at portal.zuplo.com, create an AI Gateway project, and configure your LLM providers (OpenAI, Anthropic, Google, Mistral, xAI). Set up teams and applications with budget limits that match your current LiteLLM virtual-key structure.

2 weeks

Plan locked

Swap the base URL

In your application code, change the OpenAI SDK baseURL from your LiteLLM proxy address to your Zuplo gateway URL. Replace the LiteLLM virtual key with the Zuplo app API key. The request format stays identical.

2 weeks

Foundation live

Configure policies

Enable semantic caching to reduce costs on repeated prompts. Add prompt-injection detection and secret-masking policies. Connect Galileo or Comet Opik tracing for observability. Set hierarchical budgets at the org, team, and app level.

4 weeks

Side-by-side

Decommission the self-hosted proxy

Once traffic is flowing through Zuplo and dashboards confirm correct routing, token usage, and budget enforcement, shut down your LiteLLM proxy server, PostgreSQL database, and Redis instance. Remove the associated infrastructure and on-call rotation.

2 weeks

Cut-over done

Total timeTypical migration in days, not weeks — the OpenAI-compatible API means zero application code changes

Routes & specs

Direct OpenAPI import

LiteLLM plugins

Map to TypeScript policies

Risk modelSide-by-side

Migration phases

Typical migration in days, not weeks — the OpenAI-compatible API means zero application code changes

Create your AI Gateway project
Sign up at portal.zuplo.com, create an AI Gateway project, and configure your LLM providers (OpenAI, Anthropic, Google, Mistral, xAI). Set up teams and applications with budget limits that match your current LiteLLM virtual-key structure.
2 wksPlan locked
Swap the base URL
In your application code, change the OpenAI SDK baseURL from your LiteLLM proxy address to your Zuplo gateway URL. Replace the LiteLLM virtual key with the Zuplo app API key. The request format stays identical.
2 wksFoundation live
Configure policies
Enable semantic caching to reduce costs on repeated prompts. Add prompt-injection detection and secret-masking policies. Connect Galileo or Comet Opik tracing for observability. Set hierarchical budgets at the org, team, and app level.
4 wksSide-by-side
Decommission the self-hosted proxy
Once traffic is flowing through Zuplo and dashboards confirm correct routing, token usage, and budget enforcement, shut down your LiteLLM proxy server, PostgreSQL database, and Redis instance. Remove the associated infrastructure and on-call rotation.
2 wksCut-over done

Read the full migration guide

What our customers say

Trusted by engineering teams at scale

90%

Hardware footprint reduction at scale

Read the Blockdaemon case study →

"The move to Zuplo from our existing API Management vendor was easy, taking just over 2 months to switch mission critical systems, and we're saving over 70% on costs."

Ryan Waites

Senior Director, Blockdaemon

Case study →

"Zuplo gives us the flexibility to scale efficiently, ensures security and compliance, and reduces operational complexity so we can focus on building new capabilities."

Daryl Benzel

Staff Software Engineer, Yext

Case study →

1B+

End users served via Zuplo APIs

Read the AccuWeather case study →

Hours

To launch MCP server on regulated APIs

Read the Finsolutia case study →

"We didn't touch a line of code, it's just plug and play. The results were very surprising, in just a couple of hours we had a great result and a fully working MCP Server."

Miguel Madeira

CTO & Co-Founder, Finsolutia

Case study →

Trusted for regulated and high-volume workloads

SOC 2 Type II Third-party penetration testing GDPR-aligned 24/7/365 emergency hotline

300+ Global edge locations

Billions API requests served / month

Up to 99.999% Enterprise uptime SLA

<20s Global deploy time

Frequently Asked Questions

Common questions about Zuplo vs LiteLLM.

Ready to talk to an expert?

Book a call with a solutions architect for a tailored walkthrough — SOC 2 controls, dedicated deployment, AI Gateway, and enterprise support. Or start free and explore the platform yourself.

Book a Call Start for Free

Zuplo vsLiteLLM

LiteLLM's key limitations for modern engineering teams

Self-Hosting Is the Product

Python GIL Bottleneck at Scale

Enterprise Governance Behind a Paywall

No Unified API and AI Control Plane

Built for teams replatforming off LiteLLM

Deployment and Operations

Cost and Budget Controls

Semantic Caching

Production-ready for regulated and high-volume workloads

Compliance & Audit

Identity & Access

Deployment Flexibility

Support & Success

A managed AI gateway, not just an LLM proxy

Unified control plane for REST, LLM, and MCP

Hierarchical budget enforcement

Managed semantic caching with zero infrastructure

Observability without instrumentation code

See Zuplo running on your stack

Feature-by-feature comparison

Moving from LiteLLM to Zuplo AI Gateway

Trusted by engineering teams at scale

Frequently Asked Questions

Is LiteLLM really free?

Can I migrate from LiteLLM without changing my application code?

Does Zuplo support as many LLM providers as LiteLLM?

How does Zuplo handle cost controls differently?

What about semantic caching?

Does Zuplo support self-hosted deployment?

Can I use Zuplo for my REST APIs and MCP servers too?

Ready to talk to an expert?

Zuplo vs
LiteLLM