# 🔁 Model Fallback

Model Fallback ensures high availability by automatically retrying requests on alternative models when the primary model is temporarily unavailable due to **timeouts, provider outages, rate limits..etc.**

{% hint style="info" %}
💡 **Note:** Fallback is not triggered for **provider validation errors** (e.g., invalid, unsupported, or malformed request parameters). These errors are returned immediately without retrying other models.
{% endhint %}

## How It Works

Each model can have a **fallback model** configured.

When a request fails on the primary model:

1. The router tries the fallback model automatically
2. If that model also fails, the router continues down the **fallback chain**
3. The process stops as soon as a model succeeds

Fallback models can themselves have fallbacks, forming a chained retry sequence.

#### Example Fallback Chain

```json
devstral-2512 → devstral-small-2512 → devstral-small-2507
    (fails)          (fails)            (succeeds ✅)              
```

If all models in the fallback chain fail, you receive the **error details of the last attempted model**, ensuring full debugging visibility

{% hint style="success" %}
**💰 You only pay for the successful request and for the model that ultimately served the response.**
{% endhint %}

## Via Web Console

1. Go to your [**Project Settings**](https://cortecs.ai/userArea/userProfile) **→ Inference Section**.
2. Toggle **Model Fallback ON** ✅ to automatically retry failed requests using fallback models.

<figure><img src="https://2211217319-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYGsEKyV2Zq4Q8fEJQT40%2Fuploads%2FRpBnWGOUhsjQOyKCPTq3%2Fimage.png?alt=media&#x26;token=13f3be68-9c18-46ac-bb1b-e04fd409d651" alt=""><figcaption></figcaption></figure>

3. Toggle **OFF** ❌ to disable fallback behavior. Requests will fail immediately if the primary model is unavailable.

<figure><img src="https://2211217319-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYGsEKyV2Zq4Q8fEJQT40%2Fuploads%2FuRySibVCN8XKqB8Vy4kY%2Fimage.png?alt=media&#x26;token=a8f51732-805c-4634-a0c5-75367bc0244b" alt=""><figcaption></figcaption></figure>

## Via API

You can control this behavior directly in your requests using the `enable_model_fallback` parameter:

```json
{
  "model": "devstral-2512",
  "messages": [...],
  "enable_model_fallback": true
}
```

* true ✅ (default) → Fallback models are allowed
* false ❌ → Only the originally requested model is used

## Fallback Metadata (Optional)

If fallback occurs and you set:

```json
"include_metadata": true
```

The response will include routing transparency details:

```json
{
  "metadata": {
    "routing_result": {
      "model_used": "devstral-small-2507",
      "fallback_from": "devstral-2512",
      "fallback_chain": [
        "devstral-2512",
        "devstral-small-2512",
        "devstral-medium-2507",
        "devstral-small-2507"
      ]
    }
  }
}
```

| Field            | Description                                                     |
| ---------------- | --------------------------------------------------------------- |
| `fallback_from`  | The model originally requested                                  |
| `fallback_chain` | Ordered list of every model attempted, from first to successful |
| `model_used`     | The model that actually served the request                      |

### Fallback Model Selection Logic

Fallback models are selected within the **same model family and capability tier** to maintain compatibility and predictable behavior.

* **Version Downgrade:** Newer versions fall back to older versions

```json
gpt-5.1 → gpt-5 → gpt-4.1 → gpt-4o → gpt-4o-mini
```

* **Size Downgrade:** Larger models fall back to smaller variants

<pre class="language-json"><code class="lang-json"><strong>gpt-oss-120b → gpt-oss-20b
</strong></code></pre>

* **Embedding Models:** Embedding models fall back only within the same embedding family to preserve vector space compatibility.

## Reliability & Governance

* **High Availability:** Automatically handles timeouts, outages, and rate limits without requiring client-side retries.
* **Cost Transparency:** You are only billed for the successful request and the model that ultimately served it.
* **Governance Control:** Can be disabled for strict model consistency, auditability, and predictable routing behavior.
