Apps

Fallback Models

Each AI Gateway app calls a primary model: the provider, completions model, and optional embeddings model you select under AI Models on the app's Settings tab. Fallbacks let an app keep serving requests when that primary model fails, times out, or runs over its usage limits, instead of returning an error to the caller.

The AI Gateway offers two independent fallback mechanisms, each triggered by a different condition:

Mechanism	Triggers when…	Without a fallback set…
Fallback & Timeout	The primary returns a `4xx`/`5xx` or exceeds the timeout	The error is returned to the caller
Quota Fallback	One of the app's usage limits is exceeded	The request is blocked with a `429`

Both are configured entirely in the Zuplo Portal, and either can route to any provider. The fallback doesn't have to share the primary's provider.

Error and timeout fallback

The Fallback & Timeout section fails an app over to a second model when the primary model returns a 4xx or 5xx response, or when the request takes longer than the configured timeout. This protects against provider outages, rate limiting on the primary provider, and slow responses.

Configure an error and timeout fallback

Open the Apps tab of your AI Gateway project and select the app to edit.
Select the Settings tab and find the Fallback & Timeout section.
Choose a Fallback Provider. This can be the same provider as the primary or a different one, including a custom provider.
Select the Fallback Completions model. If the app uses embeddings, also select a Fallback Embeddings model.
Set the Request timeout (seconds) value to bound how long the primary model call can run before the gateway fails over. The default is 60, but you can set any value that suits your app.
Click Save Changes.

The Fallback & Timeout section of the app Settings tab, showing the Fallback Provider, Fallback Completions, Fallback Embeddings, and Request timeout fields

The request timeout applies only when a fallback model is set. If no fallback is configured, the primary model call runs unbounded.

Quota fallback

The Quota Fallback section routes requests to an alternate, usually cheaper, model when one of the app's usage limits is exceeded, rather than blocking the request with a 429. This keeps an app available after it crosses a budget, token, or request threshold, while shifting the overflow traffic to a lower-cost model.

If you leave the quota fallback empty, the app blocks requests with a 429 once it goes over quota.

Configure a quota fallback

Open the Apps tab and select the app to edit.
Select the Settings tab. Set the limits you want to enforce under Usage Limits & Thresholds, then find the Quota Fallback section below them.
Choose a Quota Fallback Provider, then select the Quota Fallback Completions model. If the app uses embeddings, also select a Quota Fallback Embeddings model.
Click Save Changes.

The Quota Fallback section of the app Settings tab, below Usage Limits & Thresholds, showing the Quota Fallback Provider, Quota Fallback Completions, and Quota Fallback Embeddings fields

Point the quota fallback at a smaller, cheaper model so overflow traffic stays inexpensive while remaining available. The fallback's own usage still counts toward the app's limits.

How the two fallbacks combine

The mechanisms are evaluated independently and can both be active on the same app:

A request that is over quota routes to the quota fallback model.
A request that is within quota but hits an error or timeout on the primary routes to the error and timeout fallback model.

Set whichever fallbacks match the failure modes you want to protect against. Neither is required.

Managing Apps - Create, edit, and delete AI Gateway apps.
Usage Limits & Thresholds - Configure the budget, token, and request limits that trigger a quota fallback.
Custom Providers - Add your own provider to use as a primary or fallback model.

Edit this page

Last modified on June 5, 2026

Managing Apps Akamai AI Firewall

Apps

Fallback Models

The AI Gateway offers two independent fallback mechanisms, each triggered by a different condition:

Mechanism	Triggers when…	Without a fallback set…
Fallback & Timeout	The primary returns a `4xx`/`5xx` or exceeds the timeout	The error is returned to the caller
Quota Fallback	One of the app's usage limits is exceeded	The request is blocked with a `429`

Both are configured entirely in the Zuplo Portal, and either can route to any provider. The fallback doesn't have to share the primary's provider.

Error and timeout fallback

Configure an error and timeout fallback

Open the Apps tab of your AI Gateway project and select the app to edit.
Select the Settings tab and find the Fallback & Timeout section.
Choose a Fallback Provider. This can be the same provider as the primary or a different one, including a custom provider.
Select the Fallback Completions model. If the app uses embeddings, also select a Fallback Embeddings model.
Set the Request timeout (seconds) value to bound how long the primary model call can run before the gateway fails over. The default is 60, but you can set any value that suits your app.
Click Save Changes.

The Fallback & Timeout section of the app Settings tab, showing the Fallback Provider, Fallback Completions, Fallback Embeddings, and Request timeout fields

The request timeout applies only when a fallback model is set. If no fallback is configured, the primary model call runs unbounded.

Quota fallback

If you leave the quota fallback empty, the app blocks requests with a 429 once it goes over quota.

Configure a quota fallback

Open the Apps tab and select the app to edit.
Select the Settings tab. Set the limits you want to enforce under Usage Limits & Thresholds, then find the Quota Fallback section below them.
Choose a Quota Fallback Provider, then select the Quota Fallback Completions model. If the app uses embeddings, also select a Quota Fallback Embeddings model.
Click Save Changes.

The Quota Fallback section of the app Settings tab, below Usage Limits & Thresholds, showing the Quota Fallback Provider, Quota Fallback Completions, and Quota Fallback Embeddings fields

Point the quota fallback at a smaller, cheaper model so overflow traffic stays inexpensive while remaining available. The fallback's own usage still counts toward the app's limits.

How the two fallbacks combine

The mechanisms are evaluated independently and can both be active on the same app:

A request that is over quota routes to the quota fallback model.
A request that is within quota but hits an error or timeout on the primary routes to the error and timeout fallback model.

Set whichever fallbacks match the failure modes you want to protect against. Neither is required.

Managing Apps - Create, edit, and delete AI Gateway apps.
Usage Limits & Thresholds - Configure the budget, token, and request limits that trigger a quota fallback.
Custom Providers - Add your own provider to use as a primary or fallback model.

Edit this page

Last modified on June 5, 2026

Managing Apps Akamai AI Firewall

Error and timeout fallback

Configure an error and timeout fallback

Quota fallback

Configure a quota fallback

How the two fallbacks combine

Related resources

Error and timeout fallback

Configure an error and timeout fallback

Quota fallback

Configure a quota fallback

How the two fallbacks combine

Related resources