# ⚡ Quantization

**Quantization** allows requests to be routed to **quantized endpoints**, which may deliver **faster inference** and **lower costs**, with a slight reduction in accurac&#x79;**.** &#x20;

Because quantization affects inference behavior, this setting gives you explicit control over the model version used. You may choose to disable this feature to retain the original model for precision-critical tasks or regulatory compliance.

> 💡 **Note:** Quantized endpoints follow the same **security** and **data retention** as non-quantized endpoints.

## Via Web Console

1. Go to your [**Project Settings**](https://cortecs.ai/userArea/userProfile) **→ Inference Section**.
2. Toggle **Allow Quantization** **ON** ✅to allow **routing** to quantized endpoints.

<figure><img src="/files/aNeCpe3GLYppebfdHN2E" alt=""><figcaption></figcaption></figure>

3. Toggle **OFF** ❌ to restrict routing to **non-quantized endpoints only**.

<figure><img src="/files/FJXefcd6Pf6MdYLk492w" alt=""><figcaption></figcaption></figure>

## Via API

You can also control quantization directly in your requests using the `allow_quantization` parameter:

```json
{
  "model": "devstral-2512",
  "messages": [...],
  "allow_quantization": true
}
```

* `true` ✅ *(default)* → Quantized endpoints are allowed
* `false` ❌ → Only non-quantized endpoints are used

For **project-wide enforcement**, you can also configure this setting via the [**Project Config API**](https://docs.cortecs.ai/advanced-usage/project-configuration/update-project-config), ensuring all team requests follow the same policy. 🏢

## &#x20; When to use Quantization

* Reduce latency for real-time or high-throughput workloads
* Lower inference costs at scale
* Accept small accuracy trade-offs in exchange for performance gains

## &#x20;Compliance Focus

* Quantized and non-quantized endpoints are **GDPR compliant**
* Data is **never used for model training**
* Quantization alters inference behavior, which may impact EU AI Act conformity assessments. Disable this setting if strict adherence to the original model behavior is required.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cortecs.ai/security-and-compliance/quantization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.