About Serverless Routing
Overview
The Serverless LLM Router is designed to give you the best experience across multiple providers—without requiring you to manage the details yourself. Based on your request and preferences, we automatically route it to the most suitable provider currently available.
How it works
Routing happens in two main steps:
Filtering We first eliminate providers that don’t meet basic requirements:
Availability: Unavailable or unresponsive providers are excluded.
Supported Parameters: Some providers don’t support certain parameters. Since the parameters you specify are considered essential to your request, we only include providers that support all of them.
Context Length and Output Tokens: Providers that cannot support the requested context length or number of output tokens are excluded.
Sorting We use predictive ML to estimate throughput based on your request characteristics. Eligible providers, are ranked in real-time based on those estimates given your chosen priority:
Speed,
Cost, or
Balanced strategy.
If the selected provider fails unexpectedly, we retry with the next best available option.
Parameter Handling
While most providers support a similar set of parameters, some differences exist. To provide consistent results across providers, we standardize parameter handling as follows:
Default Parameters
If you don’t specify certain parameters, we apply the following defaults to ensure uniform behavior:
temperature
: 0.7top_p
: 1.0max_tokens
: 512frequency_penalty
: 0.0presence_penalty
: 0.0
Other Parameters
All other parameters are optional. If you specify them, we only include providers that support those parameters. To maximize compatibility, we recommend including only the parameters that are essential for your use case—otherwise, you may inadvertently exclude viable providers.
Last updated