Docs
cortecs.aiDedicated ModelsServerless ModelsLogin
  • Introduction
  • DEDICATED INFERENCE
    • Quickstart
    • Provisioning API
      • Authentication
      • User
      • Instances
      • Models
      • Hardware Types
    • Python client
      • Objects
      • Langchain integration
    • Examples
      • Batch jobs
      • Realtime streams
  • SERVERLESS INFERENCE
    • Quickstart
    • Serverless Routing
    • Playground
    • API Overview
      • Chat Completions
      • Embeddings
      • Models
  • Discord
Powered by GitBook
On this page
  • Overview
  • How it works?
  • Parameter Handling
  • Custom Provider Selection
  • Key Benefits of Serverless Routing
  1. SERVERLESS INFERENCE

Serverless Routing

Overview

The Serverless LLM Router is designed to deliver fast, reliable, and cost-efficient inference without requiring you to manage infrastructure or providers. Based on your request and preferences, Sky Inference automatically selects the best available provider for you.

How it works?

Routing happens in two main steps:

  1. Filtering: We instantly exclude providers that do not meet your request’s minimum requirements:

    • Availability: Unavailable or unresponsive providers are excluded.

    • Supported Parameters: We only route to providers that fully support all parameters in your request.

    • Context Length & Token Limit: Providers that cannot handle the requested context length or number of output tokens are removed from consideration.

  2. Sorting: We use predictive machine learning models to estimate each provider's throughput and performance based on your request. Eligible providers are then ranked using your selected routing preference:

    Preference
    Behavior

    Speed

    Selects the fastest provider

    Cost

    Select the most cost-efficient option

    Balanced

    Optimize for a mix of speed and cost

You can set your routing preference in two ways:

  • In the API request body: "preference": "speed" | "cost" | "balanced"

  • In the model web interface, use the Select Preference button.

If you don’t specify a preference, the system defaults to balanced.

🔄 Automatic Fallback: If the top-ranked provider fails during processing, the router instantly retries with the next best option to maximize uptime and minimizes failed requests.

Parameter Handling

While most providers support common parameters, some differences exist. Sky Inference normalizes these differences to ensure consistent behavior.

Default Parameters

If you don’t specify certain parameters, the following defaults apply:

  • temperature: 0.7

  • top_p: 1.0

  • max_tokens: 512

  • frequency_penalty: 0.0

  • presence_penalty: 0.0

Other Parameters

Including optional parameters in your request restricts selection to providers that fully support all specified parameters. This can reduce the pool of eligible providers and may limit routing options.

💡 Tip: To keep your requests flexible and ensure faster, more reliable routing, include only the parameters that are essential for your use case.

Custom Provider Selection

Restricting Providers

If you want to route requests only to specific providers:

"allowed_providers": ["providerA", "providerB"]
  • The router will try these providers sequentially.

  • No fallback will be applied outside your list.

✅ Use Case: Useful for data residency, pricing control, or special contractual agreements.

Key Benefits of Serverless Routing

  • Automatic Provider Selection: Always choose the best available provider without manual work.

  • Failover & High Uptime: Seamless fallback to alternative providers ensures reliability.

  • Smart Routing Strategies: Optimize for speed, cost, or balance on a per-request basis.

  • Custom Control: Limit routing to specific providers when needed.

  • No Infrastructure Management: Fully serverless, no provisioning required.

PreviousQuickstartNextPlayground

Last updated 7 days ago