# Advanced Usage

## Overview

The **LLM Router** is designed to deliver **fast, reliable, and cost-efficient inference** without requiring you to manage infrastructure or providers. Based on your request and preferences, **cortecs** automatically selects the best available provider for you.

## How it works?

Routing happens in two main steps:

1. **Filtering:**\
   We instantly exclude providers that do not meet your request’s minimum requirements:
   * **Compliance Requirements:** Providers that don’t meet your compliance requirements are automatically excluded.
   * **Availability**: Unavailable or unresponsive providers are excluded.
   * **Context Length & Token Limit**: Providers that cannot handle the requested context length or number of output tokens are removed from consideration.
2. **Ranking:**\
   We use **predictive machine learning models** to estimate each provider's throughput and performance based on your request. Eligible providers are then ranked using your selected routing preference:

   | Preference             | Behavior                              |
   | ---------------------- | ------------------------------------- |
   | **Speed**              | Selects the fastest provider          |
   | **Cost**               | Select the most cost-efficient option |
   | **Balanced (default)** | Optimize for a mix of speed and cost  |

&#x20; You can set your routing preference in two ways:

* In the **API request body**:  `"preference": "speed" | "cost" | "balanced"`
* In the **model web interface**, use the **Select Preference** button.

&#x20; If you don’t specify a preference, the system defaults to **balanced**.

> 🔄 **Automatic Fallback:** If the top-ranked provider fails during processing, the router instantly retries with the next best option to **maximize** **uptime** and **minimizes failed requests**.

## Usage Cost

Response data typically includes a usage object that tracks `prompt_tokens`, `completion_tokens`, and `total_tokens` for every request. Cortecs enhances this metadata by adding cost information directly. This allows you to monitor the exact monetary spend of an inference call.

The `usage` field contains the following fields:

* `cost`: The overall cost of the call
* `cost_details`:
  * `prompt_cost`: Cost of the prompt tokens (input)
  * `completion_cost`: Cost of the completion tokens (output)
  * `prompt_audio_cost`: Cost of the audio

{% hint style="info" %}
**Note:** The costs returned represent credits where 1€ = 1.000.000 credits.
{% endhint %}

## Parameter Handling

**Cortecs** allows you to define preferred parameters across multiple providers. If a provider encounters an error while processing your parameters, the response is returned directly from that provider. Additionally, you can also pass router-specific parameters to control routing behavior.

**Router Specific Parameters**

The following parameters control the behaviour of the router:

* `preference`: Choose between `speed`, `cost` and `balanced`. Default is `balanced`.
* `allowed_providers`: A list of allowed providers. The router will only use the specified providers and will not fall back to others. By default, all available providers are allowed.
* `eu_native`: A boolean indicating whether to restrict routing to EU-based and regulated providers. Default is `false`. Even when set to `false`, all routing remains GDPR compliant.
* `allow_quantization`: A boolean indicating whether to allow quantized endpoints. Default is `true`. Quantization is a model compression technique that typically preserves full accuracy, so enabling this option usually does not impact model performance.
* `timeout`: An integer representing the request timeout period in seconds. It defaults to `600`, providing a 10-minute window before the request is terminated.

> When using these parameters with an **OpenAI-compatible wrapper**, include them in the `extra_body` dictionary. See example [here](https://docs.cortecs.ai/quickstart#id-4.-send-your-first-request).

#### **Default Parameters**

If you don’t specify certain parameters, the following defaults apply:

* `temperature`: 0.7
* `frequency_penalty`: 0.0
* `presence_penalty`: 0.0

#### **Other Parameters**

Including optional parameters in your request may reduce the likelihood of successful execution, as not all providers support every parameter. When unsupported parameters are included, some providers may reject the request or fail to process it correctly.

> 💡 **Tip:** To improve reliability and maximize compatibility across providers, include **only the parameters that are essential** for your use case.
