π Model Fallback
Model Fallback ensures high availability by automatically retrying requests on alternative models when the primary model is temporarily unavailable due to timeouts, provider outages, rate limits..etc.
π‘ Note: Fallback is not triggered for provider validation errors (e.g., invalid, unsupported, or malformed request parameters). These errors are returned immediately without retrying other models.
How It Works
Each model can have a fallback model configured.
When a request fails on the primary model:
The router tries the fallback model automatically
If that model also fails, the router continues down the fallback chain
The process stops as soon as a model succeeds
Fallback models can themselves have fallbacks, forming a chained retry sequence.
Example Fallback Chain
devstral-2512 β devstral-small-2512 β devstral-small-2507
(fails) (fails) (succeeds β
) If all models in the fallback chain fail, you receive the error details of the last attempted model, ensuring full debugging visibility
π° You only pay for the successful request and for the model that ultimately served the response.
Via Web Console
Go to your Project Settings β Inference Section.
Toggle Model Fallback ON β to automatically retry failed requests using fallback models.

Toggle OFF β to disable fallback behavior. Requests will fail immediately if the primary model is unavailable.

Via API
You can control this behavior directly in your requests using the enable_model_fallback parameter:
true β (default) β Fallback models are allowed
false β β Only the originally requested model is used
Fallback Metadata (Optional)
If fallback occurs and you set:
The response will include routing transparency details:
fallback_from
The model originally requested
fallback_chain
Ordered list of every model attempted, from first to successful
model_used
The model that actually served the request
Fallback Model Selection Logic
Fallback models are selected within the same model family and capability tier to maintain compatibility and predictable behavior.
Version Downgrade: Newer versions fall back to older versions
Size Downgrade: Larger models fall back to smaller variants
Embedding Models: Embedding models fall back only within the same embedding family to preserve vector space compatibility.
Reliability & Governance
High Availability: Automatically handles timeouts, outages, and rate limits without requiring client-side retries.
Cost Transparency: You are only billed for the successful request and the model that ultimately served it.
Governance Control: Can be disabled for strict model consistency, auditability, and predictable routing behavior.
Last updated