Fallback Models
Each AI Gateway app calls a primary model: the provider, completions model, and optional embeddings model you select under AI Models on the app's Settings tab. Fallbacks let an app keep serving requests when that primary model fails, times out, or runs over its usage limits, instead of returning an error to the caller.
The AI Gateway offers two independent fallback mechanisms, each triggered by a different condition:
| Mechanism | Triggers when… | Without a fallback set… |
|---|---|---|
| Fallback & Timeout | The primary returns a 4xx/5xx or exceeds the timeout | The error is returned to the caller |
| Quota Fallback | One of the app's usage limits is exceeded | The request is blocked with a 429 |
Both are configured entirely in the Zuplo Portal, and either can route to any provider. The fallback doesn't have to share the primary's provider.
Error and timeout fallback
The Fallback & Timeout section fails an app over to a second model when the
primary model returns a 4xx or 5xx response, or when the request takes
longer than the configured timeout. This protects against provider outages, rate
limiting on the primary provider, and slow responses.
Configure an error and timeout fallback
-
Open the Apps tab of your AI Gateway project and select the app to edit.
-
Select the Settings tab and find the Fallback & Timeout section.
-
Choose a Fallback Provider. This can be the same provider as the primary or a different one, including a custom provider.
-
Select the Fallback Completions model. If the app uses embeddings, also select a Fallback Embeddings model.
-
Set the Request timeout (seconds) value to bound how long the primary model call can run before the gateway fails over. The default is
60, but you can set any value that suits your app. -
Click Save Changes.

The request timeout applies only when a fallback model is set. If no fallback is configured, the primary model call runs unbounded.
Quota fallback
The Quota Fallback section routes requests to an alternate, usually cheaper,
model when one of the app's usage limits is exceeded,
rather than blocking the request with a 429. This keeps an app available after
it crosses a budget, token, or request threshold, while shifting the overflow
traffic to a lower-cost model.
If you leave the quota fallback empty, the app blocks requests with a 429 once
it goes over quota.
Configure a quota fallback
-
Open the Apps tab and select the app to edit.
-
Select the Settings tab. Set the limits you want to enforce under Usage Limits & Thresholds, then find the Quota Fallback section below them.
-
Choose a Quota Fallback Provider, then select the Quota Fallback Completions model. If the app uses embeddings, also select a Quota Fallback Embeddings model.
-
Click Save Changes.

Point the quota fallback at a smaller, cheaper model so overflow traffic stays inexpensive while remaining available. The fallback's own usage still counts toward the app's limits.
How the two fallbacks combine
The mechanisms are evaluated independently and can both be active on the same app:
- A request that is over quota routes to the quota fallback model.
- A request that is within quota but hits an error or timeout on the primary routes to the error and timeout fallback model.
Set whichever fallbacks match the failure modes you want to protect against. Neither is required.
Related resources
- Managing Apps - Create, edit, and delete AI Gateway apps.
- Usage Limits & Thresholds - Configure the budget, token, and request limits that trigger a quota fallback.
- Custom Providers - Add your own provider to use as a primary or fallback model.