# Fallback Models

Each AI Gateway app calls a primary model: the provider, completions model, and
optional embeddings model you select under **AI Models** on the app's
**Settings** tab. Fallbacks let an app keep serving requests when that primary
model fails, times out, or runs over its usage limits, instead of returning an
error to the caller.

The AI Gateway offers two independent fallback mechanisms, each triggered by a
different condition:

| Mechanism              | Triggers when…                                           | Without a fallback set…             |
| ---------------------- | -------------------------------------------------------- | ----------------------------------- |
| **Fallback & Timeout** | The primary returns a `4xx`/`5xx` or exceeds the timeout | The error is returned to the caller |
| **Quota Fallback**     | One of the app's usage limits is exceeded                | The request is blocked with a `429` |

Both are configured entirely in the Zuplo Portal, and either can route to _any_
provider. The fallback doesn't have to share the primary's provider.

## Error and timeout fallback

The **Fallback & Timeout** section fails an app over to a second model when the
primary model returns a `4xx` or `5xx` response, or when the request takes
longer than the configured timeout. This protects against provider outages, rate
limiting on the primary provider, and slow responses.

### Configure an error and timeout fallback

<Stepper>

1. Open the [Apps](https://portal.zuplo.com/+/account/project/ai/apps) tab of
   your AI Gateway project and select the app to edit.

1. Select the **Settings** tab and find the **Fallback & Timeout** section.

1. Choose a **Fallback Provider**. This can be the same provider as the primary
   or a different one, including a [custom provider](./custom-providers.mdx).

1. Select the **Fallback Completions** model. If the app uses embeddings, also
   select a **Fallback Embeddings** model.

1. Set the **Request timeout (seconds)** value to bound how long the primary
   model call can run before the gateway fails over. The default is `60`, but
   you can set any value that suits your app.

1. Click **Save Changes**.

</Stepper>

<Framed>

![The Fallback & Timeout section of the app Settings tab, showing the Fallback Provider, Fallback Completions, Fallback Embeddings, and Request timeout fields](./fallback-and-timeout.png)

</Framed>

:::note

The request timeout applies _only_ when a fallback model is set. If no fallback
is configured, the primary model call runs unbounded.

:::

## Quota fallback

The **Quota Fallback** section routes requests to an alternate, usually cheaper,
model when one of the app's [usage limits](./usage-limits.mdx) is exceeded,
rather than blocking the request with a `429`. This keeps an app available after
it crosses a budget, token, or request threshold, while shifting the overflow
traffic to a lower-cost model.

If you leave the quota fallback empty, the app blocks requests with a `429` once
it goes over quota.

### Configure a quota fallback

<Stepper>

1. Open the [Apps](https://portal.zuplo.com/+/account/project/ai/apps) tab and
   select the app to edit.

1. Select the **Settings** tab. Set the limits you want to enforce under **Usage
   Limits & Thresholds**, then find the **Quota Fallback** section below them.

1. Choose a **Quota Fallback Provider**, then select the **Quota Fallback
   Completions** model. If the app uses embeddings, also select a **Quota
   Fallback Embeddings** model.

1. Click **Save Changes**.

</Stepper>

<Framed>

![The Quota Fallback section of the app Settings tab, below Usage Limits & Thresholds, showing the Quota Fallback Provider, Quota Fallback Completions, and Quota Fallback Embeddings fields](./quota-fallback.png)

</Framed>

:::tip

Point the quota fallback at a smaller, cheaper model so overflow traffic stays
inexpensive while remaining available. The fallback's own usage still counts
toward the app's limits.

:::

## How the two fallbacks combine

The mechanisms are evaluated independently and can both be active on the same
app:

- A request that is **over quota** routes to the quota fallback model.
- A request that is **within quota** but hits an **error or timeout** on the
  primary routes to the error and timeout fallback model.

Set whichever fallbacks match the failure modes you want to protect against.
Neither is required.

## Related resources

- [Managing Apps](./managing-apps.mdx) - Create, edit, and delete AI Gateway
  apps.
- [Usage Limits & Thresholds](./usage-limits.mdx) - Configure the budget, token,
  and request limits that trigger a quota fallback.
- [Custom Providers](./custom-providers.mdx) - Add your own provider to use as a
  primary or fallback model.
