Docs
cortecs.aiDedicated ModelsServerless ModelsLogin
  • Introduction
  • DEDICATED INFERENCE
    • Quickstart
    • Provisioning API
      • Authentication
      • User
      • Instances
      • Models
      • Hardware Types
    • Python client
      • Objects
      • Langchain integration
    • Examples
      • Batch jobs
      • Realtime streams
  • SERVERLESS INFERENCE
    • Quickstart
    • About Serverless Routing
    • API
      • Chat Completions
      • Models
  • Discord
Powered by GitBook
On this page
  • Overview
  • How it works
  • Parameter Handling
  1. SERVERLESS INFERENCE

About Serverless Routing

Overview

The Serverless LLM Router is designed to give you the best experience across multiple providers—without requiring you to manage the details yourself. Based on your request and preferences, we automatically route it to the most suitable provider currently available.

How it works

Routing happens in two main steps:

  1. Filtering We first eliminate providers that don’t meet basic requirements:

    • Availability: Unavailable or unresponsive providers are excluded.

    • Supported Parameters: Some providers don’t support certain parameters. Since the parameters you specify are considered essential to your request, we only include providers that support all of them.

    • Context Length and Output Tokens: Providers that cannot support the requested context length or number of output tokens are excluded.

  2. Sorting We use predictive ML to estimate throughput based on your request characteristics. Eligible providers, are ranked in real-time based on those estimates given your chosen priority:

    • Speed,

    • Cost, or

    • Balanced strategy.

    If the selected provider fails unexpectedly, we retry with the next best available option.

Parameter Handling

While most providers support a similar set of parameters, some differences exist. To provide consistent results across providers, we standardize parameter handling as follows:

Default Parameters

If you don’t specify certain parameters, we apply the following defaults to ensure uniform behavior:

  • temperature: 0.7

  • top_p: 1.0

  • max_tokens: 512

  • frequency_penalty: 0.0

  • presence_penalty: 0.0

Other Parameters

All other parameters are optional. If you specify them, we only include providers that support those parameters. To maximize compatibility, we recommend including only the parameters that are essential for your use case—otherwise, you may inadvertently exclude viable providers.

PreviousQuickstartNextAPI

Last updated 11 days ago