Chat Completions

Create a chat completion

post

This endpoint creates a chat completion using the specified model.

Authorizations
Body

A request object for generating chat completions and controlling router behavior. This object contains suggested parameters to generate a response from the specified model. Many of the parameters are optional, and it is recommended to set them only if needed; however, you may include other parameters as required. Note that not all providers support the same set of parameters. Adding unsupported or unnecessary parameters can cause requests to fail or limit the providers able to process them.

modelstringOptional

The model to use for the completion.

Example: mistral-small-2503
preferencestring · enumOptional

The provider preference for handling the request.

Default: balancedPossible values:
allowed_providersstring[] | nullableOptional

The providers that are allowed to be used for the completion.

Example: ["mistral","scaleway"]
eu_nativeboolean | nullableOptional

Whether to consider only providers based and regulated withing the EU. Even when false, all our endpoints are GDPR compliant.

Default: falseExample: false
allow_quantizationboolean | nullableOptional

Whether to allow quantized endpoints.

Default: trueExample: true
temperaturenumber | nullableOptional

Controls randomness in the output. Higher values make the output more random.

Default: 0.7
max_tokensinteger | nullableOptional

The maximum number of tokens to generate in the completion. It can also be referred to as max_completion_tokens. The limit depends on the model’s context size — it can’t exceed the context size minus your prompt length.

Example: 512
top_pnumber | nullableOptional

Controls diversity via nucleus sampling - only tokens whose cumulative probability mass exceeds top_p are considered for sampling. For example, 0.1 means only tokens comprising the top 10% probability mass are considered. An alternative to temperature sampling - we recommend altering either top_p or temperature, but not both.

Example: 1
frequency_penaltynumber | nullableOptional

Reduces the probability of generating a token based on its frequency in the text so far. The more times a token has appeared in the text so far, the lower the probability of it appearing in the completion.

Default: 0
presence_penaltynumber | nullableOptional

Reduces the probability of generating a token based on whether it has already appeared in the text so far. If a token has already appeared in the text so far, the probability of it appearing in the completion is reduced.

Default: 0
response_formatobject | nullableOptional

Specifies the format of the response.

Example: {"type":"json_schema","json_schema":{"...":null}}
stopstring[] | nullableOptional

Sequences where the API will stop generating further tokens.

Example: ["\n\n"]
streamboolean | nullableOptional

Whether to stream the response. The last chunk will contain the usage information.

Default: false
logprobsone of | nullableOptional

Whether to return log probabilities of the output tokens.

Example: false
integerOptional
or
booleanOptional
seedinteger | nullableOptional

Random seed for reproducible results.

Example: 42
toolsobject[] | nullableOptional

List of tools available to the model.

tool_choicestring | nullableOptional

Controls which tool the model should use. Only set if tools is not empty.

ninteger | nullableOptional

Number of completions to generate.

Example: 1
predictionobject | nullableOptional

Specify expected results, optimizing response times by leveraging known or predictable content. This approach is especially effective for updating text documents or code files with minimal changes, reducing latency while maintaining high-quality results.

Example: {"type":"content","content":""}
parallel_tool_callsbooleanOptional

Whether to allow parallel tool calls.

safe_promptbooleanOptional

Whether to inject a safety prompt before all conversations.

Responses
200

A chat completion.

application/json
post
/chat/completions
POST /v1/chat/completions HTTP/1.1
Host: api.cortecs.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 541

{
  "model": "mistral-small-2503",
  "messages": [
    {
      "role": "user",
      "content": "Tell me a joke."
    }
  ],
  "preference": "balanced",
  "allowed_providers": [
    "mistral",
    "scaleway"
  ],
  "eu_native": false,
  "allow_quantization": true,
  "temperature": 0.7,
  "max_tokens": 512,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "...": null
    }
  },
  "stop": [
    "\n\n"
  ],
  "stream": false,
  "logprobs": false,
  "seed": 42,
  "tools": null,
  "tool_choice": null,
  "n": 1,
  "prediction": {
    "type": "content",
    "content": ""
  },
  "parallel_tool_calls": true,
  "safe_prompt": true
}
{
  "object": "chat.completion",
  "id": "cmpl_1234567890",
  "created": 1715155200,
  "provider": "mistral",
  "model": "mistral-small-2503",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here is a joke for you: Why did the chicken cross the road? To get to the other side!",
        "tool_calls": [
          {
            "id": "text",
            "type": "text",
            "function": {}
          }
        ],
        "reasoning_content": "text"
      },
      "finish_reason": "text",
      "logprobs": {}
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200
  },
  "prompt_logprobs": null
}

Last updated