# Model Fallback

Model Fallback ensures high availability by automatically retrying requests on alternative models when the primary model is temporarily unavailable due to **timeouts, provider outages, rate limits..etc.**

{% hint style="info" %}
💡 **Note:** Fallback is not triggered for **provider validation errors** (e.g., invalid, unsupported, or malformed request parameters). These errors are returned immediately without retrying other models.
{% endhint %}

## How It Works

Each model can have a **fallback model** configured.

When a request fails on the primary model:

1. The router tries the fallback model automatically
2. If that model also fails, the router continues down the **fallback chain**
3. The process stops as soon as a model succeeds

Fallback models can themselves have fallbacks, forming a chained retry sequence.

#### Example Fallback Chain

```json
devstral-2512 → devstral-small-2512 → devstral-small-2507
    (fails)          (fails)            (succeeds ✅)              
```

If all models in the fallback chain fail, you receive the **error details of the last attempted model**, ensuring full debugging visibility

{% hint style="success" %}
**💰 You only pay for the successful request and for the model that ultimately served the response.**
{% endhint %}

## Via Web Console

1. Go to your [**Project Settings**](https://cortecs.ai/userArea/userProfile) **→ Inference Section**.
2. Toggle **Model Fallback ON** ✅ to automatically retry failed requests using fallback models.

<figure><img src="/files/fKUpXjiNTU6EN4jqvJGJ" alt=""><figcaption></figcaption></figure>

3. Toggle **OFF** ❌ to disable fallback behavior. Requests will fail immediately if the primary model is unavailable.

<figure><img src="/files/fJnuy4q0BpznC5wXyMpl" alt=""><figcaption></figcaption></figure>

## Via API

You can control this behavior directly in your requests using the `enable_model_fallback` parameter:

```json
{
  "model": "devstral-2512",
  "messages": [...],
  "enable_model_fallback": true
}
```

* true ✅ (default) → Fallback models are allowed
* false ❌ → Only the originally requested model is used

### Fallback Model Selection Logic

Fallback models are selected within the **same model family and capability tier** to maintain compatibility and predictable behavior.

* **Version Downgrade:** Newer versions fall back to older versions

```json
gpt-5.1 → gpt-5 → gpt-4.1 → gpt-4o → gpt-4o-mini
```

* **Size Downgrade:** Larger models fall back to smaller variants

<pre class="language-json"><code class="lang-json"><strong>gpt-oss-120b → gpt-oss-20b
</strong></code></pre>

* **Embedding Models:** Embedding models fall back only within the same embedding family to preserve vector space compatibility.

## Reliability & Governance

* **High Availability:** Automatically handles timeouts, outages, and rate limits without requiring client-side retries.
* **Cost Transparency:** You are only billed for the successful request and the model that ultimately served it.
* **Governance Control:** Can be disabled for strict model consistency, auditability, and predictable routing behavior.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cortecs.ai/features/model-fallback.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.