# ⚡ Quantization

**Quantization** allows requests to be routed to **quantized endpoints**, which may deliver **faster inference** and **lower costs**, with a slight reduction in accurac&#x79;**.** &#x20;

Because quantization affects inference behavior, this setting gives you explicit control over the model version used. You may choose to disable this feature to retain the original model for precision-critical tasks or regulatory compliance.

> 💡 **Note:** Quantized endpoints follow the same **security** and **data retention** as non-quantized endpoints.

## Via Web Console

1. Go to your [**Project Settings**](https://cortecs.ai/userArea/userProfile) **→ Inference Section**.
2. Toggle **Allow Quantization** **ON** ✅to allow **routing** to quantized endpoints.

<figure><img src="https://2211217319-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYGsEKyV2Zq4Q8fEJQT40%2Fuploads%2Fczq8Mm4u79OcyQfFSMrT%2Fimage.png?alt=media&#x26;token=c901574b-6276-42e0-9099-9379062aa807" alt=""><figcaption></figcaption></figure>

3. Toggle **OFF** ❌ to restrict routing to **non-quantized endpoints only**.

<figure><img src="https://2211217319-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYGsEKyV2Zq4Q8fEJQT40%2Fuploads%2FtNNVnYZV2H5UyurrpKfN%2Fimage.png?alt=media&#x26;token=7d9d5306-cf27-477f-93db-85054f8bb8eb" alt=""><figcaption></figcaption></figure>

## Via API

You can also control quantization directly in your requests using the `allow_quantization` parameter:

```json
{
  "model": "devstral-2512",
  "messages": [...],
  "allow_quantization": true
}
```

* `true` ✅ *(default)* → Quantized endpoints are allowed
* `false` ❌ → Only non-quantized endpoints are used

For **project-wide enforcement**, you can also configure this setting via the [**Project Config API**](https://docs.cortecs.ai/advanced-usage/project-configuration/update-project-config), ensuring all team requests follow the same policy. 🏢

## &#x20; When to use Quantization

* Reduce latency for real-time or high-throughput workloads
* Lower inference costs at scale
* Accept small accuracy trade-offs in exchange for performance gains

## &#x20;Compliance Focus

* Quantized and non-quantized endpoints are **GDPR compliant**
* Data is **never used for model training**
* Quantization alters inference behavior, which may impact EU AI Act conformity assessments. Disable this setting if strict adherence to the original model behavior is required.
