ZuploZuplo
LoginStart for Free
  • Documentation
  • API Reference
Introduction
Getting Started
    Develop on the web portal
      1 - Setup Your Gateway2 - Rate Limiting3 - API Key Auth4 - Deploy5 - Dynamic Rate LimitingDynamic MCP Server - Quickstart
    Develop locally with the CLI
      1 - Setup Your Gateway2 - Rate Limiting3 - API Key Auth4 - Deploy5 - Dynamic Rate LimitingDynamic MCP Server - Quickstart
Concepts
Development
Policies
Handlers
API Keys
MCP Server
MCP Gateway
AI Gateway
    IntroductionGetting StartedUniversal API
    Providers
    Teams
    Apps
      OverviewManaging AppsFallbacks
    Guardrails & Policies
    Integrations
Developer Portal
Monetization
Deploying & Source Control
Observability
Networking & Infrastructure
Account Management
Programming API
Build with AI
Zuplo CLI
Migration Guides
Platform LimitsSecuritySupportTrust & ComplianceChangelog
powered by Zudoku
Apps

Fallback Models

Each AI Gateway app calls a primary model: the provider, completions model, and optional embeddings model you select under AI Models on the app's Settings tab. Fallbacks let an app keep serving requests when that primary model fails, times out, or runs over its usage limits, instead of returning an error to the caller.

The AI Gateway offers two independent fallback mechanisms, each triggered by a different condition:

MechanismTriggers when…Without a fallback set…
Fallback & TimeoutThe primary returns a 4xx/5xx or exceeds the timeoutThe error is returned to the caller
Quota FallbackOne of the app's usage limits is exceededThe request is blocked with a 429

Both are configured entirely in the Zuplo Portal, and either can route to any provider. The fallback doesn't have to share the primary's provider.

Error and timeout fallback

The Fallback & Timeout section fails an app over to a second model when the primary model returns a 4xx or 5xx response, or when the request takes longer than the configured timeout. This protects against provider outages, rate limiting on the primary provider, and slow responses.

Configure an error and timeout fallback

  1. Open the Apps tab of your AI Gateway project and select the app to edit.

  2. Select the Settings tab and find the Fallback & Timeout section.

  3. Choose a Fallback Provider. This can be the same provider as the primary or a different one, including a custom provider.

  4. Select the Fallback Completions model. If the app uses embeddings, also select a Fallback Embeddings model.

  5. Set the Request timeout (seconds) value to bound how long the primary model call can run before the gateway fails over. The default is 60, but you can set any value that suits your app.

  6. Click Save Changes.

The Fallback & Timeout section of the app Settings tab, showing the Fallback Provider, Fallback Completions, Fallback Embeddings, and Request timeout fields

The request timeout applies only when a fallback model is set. If no fallback is configured, the primary model call runs unbounded.

Quota fallback

The Quota Fallback section routes requests to an alternate, usually cheaper, model when one of the app's usage limits is exceeded, rather than blocking the request with a 429. This keeps an app available after it crosses a budget, token, or request threshold, while shifting the overflow traffic to a lower-cost model.

If you leave the quota fallback empty, the app blocks requests with a 429 once it goes over quota.

Configure a quota fallback

  1. Open the Apps tab and select the app to edit.

  2. Select the Settings tab. Set the limits you want to enforce under Usage Limits & Thresholds, then find the Quota Fallback section below them.

  3. Choose a Quota Fallback Provider, then select the Quota Fallback Completions model. If the app uses embeddings, also select a Quota Fallback Embeddings model.

  4. Click Save Changes.

The Quota Fallback section of the app Settings tab, below Usage Limits & Thresholds, showing the Quota Fallback Provider, Quota Fallback Completions, and Quota Fallback Embeddings fields

Point the quota fallback at a smaller, cheaper model so overflow traffic stays inexpensive while remaining available. The fallback's own usage still counts toward the app's limits.

How the two fallbacks combine

The mechanisms are evaluated independently and can both be active on the same app:

  • A request that is over quota routes to the quota fallback model.
  • A request that is within quota but hits an error or timeout on the primary routes to the error and timeout fallback model.

Set whichever fallbacks match the failure modes you want to protect against. Neither is required.

Related resources

  • Managing Apps - Create, edit, and delete AI Gateway apps.
  • Usage Limits & Thresholds - Configure the budget, token, and request limits that trigger a quota fallback.
  • Custom Providers - Add your own provider to use as a primary or fallback model.
Edit this page
Last modified on June 5, 2026
Managing AppsAkamai AI Firewall
On this page
  • Error and timeout fallback
    • Configure an error and timeout fallback
  • Quota fallback
    • Configure a quota fallback
  • How the two fallbacks combine
  • Related resources