πŸ” Model Fallback

Model Fallback ensures high availability by automatically retrying requests on alternative models when the primary model is temporarily unavailable due to timeouts, provider outages, rate limits..etc.

circle-info

πŸ’‘ Note: Fallback is not triggered for provider validation errors (e.g., invalid, unsupported, or malformed request parameters). These errors are returned immediately without retrying other models.

How It Works

Each model can have a fallback model configured.

When a request fails on the primary model:

  1. The router tries the fallback model automatically

  2. If that model also fails, the router continues down the fallback chain

  3. The process stops as soon as a model succeeds

Fallback models can themselves have fallbacks, forming a chained retry sequence.

Example Fallback Chain

devstral-2512 β†’ devstral-small-2512 β†’ devstral-small-2507
    (fails)          (fails)            (succeeds βœ…)              

If all models in the fallback chain fail, you receive the error details of the last attempted model, ensuring full debugging visibility

circle-check

Via Web Console

  1. Go to your Project Settingsarrow-up-right β†’ Inference Section.

  2. Toggle Model Fallback ON βœ… to automatically retry failed requests using fallback models.

  1. Toggle OFF ❌ to disable fallback behavior. Requests will fail immediately if the primary model is unavailable.

Via API

You can control this behavior directly in your requests using the enable_model_fallback parameter:

  • true βœ… (default) β†’ Fallback models are allowed

  • false ❌ β†’ Only the originally requested model is used

Fallback Metadata (Optional)

If fallback occurs and you set:

The response will include routing transparency details:

Field
Description

fallback_from

The model originally requested

fallback_chain

Ordered list of every model attempted, from first to successful

model_used

The model that actually served the request

Fallback Model Selection Logic

Fallback models are selected within the same model family and capability tier to maintain compatibility and predictable behavior.

  • Version Downgrade: Newer versions fall back to older versions

  • Size Downgrade: Larger models fall back to smaller variants

  • Embedding Models: Embedding models fall back only within the same embedding family to preserve vector space compatibility.

Reliability & Governance

  • High Availability: Automatically handles timeouts, outages, and rate limits without requiring client-side retries.

  • Cost Transparency: You are only billed for the successful request and the model that ultimately served it.

  • Governance Control: Can be disabled for strict model consistency, auditability, and predictable routing behavior.

Last updated