Chat Completions

Create a chat completion

post

This endpoint creates a chat completion using the specified model.

Authorizations

AuthorizationstringRequired

Bearer authentication header of the form Bearer <token>.

Body

A request object for generating chat completions and controlling router behavior. This object contains suggested parameters to generate a response from the specified model. Many of the parameters are optional, and it is recommended to set them only if needed; however, you may include other parameters as required. Note that not all providers support the same set of parameters. Adding unsupported or unnecessary parameters can cause requests to fail or limit the providers able to process them.

modelstringOptional

The model to use for the completion.

Example: mistral-small-2503

preferencestring · enumOptional

The provider preference for handling the request.

Default: balancedPossible values:

allowed_providersstring[] | nullableOptional

The providers that are allowed to be used for the completion.

Example: ["mistral","scaleway"]

eu_nativeboolean | nullableOptional

Whether to consider only providers based and regulated withing the EU. Even when false, all our endpoints are GDPR compliant.

Default: falseExample: false

allow_quantizationboolean | nullableOptional

Whether to allow quantized endpoints.

Default: trueExample: true

temperaturenumber | nullableOptional

Controls randomness in the output. Higher values make the output more random.

Default: 0.7

max_tokensinteger | nullableOptional

The maximum number of tokens to generate in the completion. It can also be referred to as max_completion_tokens. The limit depends on the model’s context size — it can’t exceed the context size minus your prompt length.

Example: 512

top_pnumber | nullableOptional

Controls diversity via nucleus sampling - only tokens whose cumulative probability mass exceeds top_p are considered for sampling. For example, 0.1 means only tokens comprising the top 10% probability mass are considered. An alternative to temperature sampling - we recommend altering either top_p or temperature, but not both.

Example: 1

frequency_penaltynumber | nullableOptional

Reduces the probability of generating a token based on its frequency in the text so far. The more times a token has appeared in the text so far, the lower the probability of it appearing in the completion.

Default: 0

presence_penaltynumber | nullableOptional

Reduces the probability of generating a token based on whether it has already appeared in the text so far. If a token has already appeared in the text so far, the probability of it appearing in the completion is reduced.

Default: 0

response_formatobject | nullableOptional

Specifies the format of the response.

Example: {"type":"json_schema","json_schema":{"...":null}}

stopstring[] | nullableOptional

Sequences where the API will stop generating further tokens.

Example: ["\n\n"]

streamboolean | nullableOptional

Whether to stream the response. The last chunk will contain the usage information.

Default: false

logprobsone of | nullableOptional

Whether to return log probabilities of the output tokens.

Example: false

integerOptional

booleanOptional

seedinteger | nullableOptional

Random seed for reproducible results.

Example: 42

toolsobject[] | nullableOptional

List of tools available to the model.

tool_choicestring | nullableOptional

Controls which tool the model should use. Only set if tools is not empty.

ninteger | nullableOptional

Number of completions to generate.

Example: 1

predictionobject | nullableOptional

Specify expected results, optimizing response times by leveraging known or predictable content. This approach is especially effective for updating text documents or code files with minimal changes, reducing latency while maintaining high-quality results.

Example: {"type":"content","content":""}

parallel_tool_callsbooleanOptional

Whether to allow parallel tool calls.

safe_promptbooleanOptional

Whether to inject a safety prompt before all conversations.

Responses

200

A chat completion.

application/json

500

Internal server error.

post

/chat/completions

POST /v1/chat/completions HTTP/1.1
Host: api.cortecs.ai
Authorization: Bearer YOUR_SECRET_TOKEN
Content-Type: application/json
Accept: */*
Content-Length: 541

{
  "model": "mistral-small-2503",
  "messages": [
    {
      "role": "user",
      "content": "Tell me a joke."
    }
  ],
  "preference": "balanced",
  "allowed_providers": [
    "mistral",
    "scaleway"
  ],
  "eu_native": false,
  "allow_quantization": true,
  "temperature": 0.7,
  "max_tokens": 512,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "...": null
    }
  },
  "stop": [
    "\n\n"
  ],
  "stream": false,
  "logprobs": false,
  "seed": 42,
  "tools": null,
  "tool_choice": null,
  "n": 1,
  "prediction": {
    "type": "content",
    "content": ""
  },
  "parallel_tool_calls": true,
  "safe_prompt": true
}

{
  "object": "chat.completion",
  "id": "cmpl_1234567890",
  "created": 1715155200,
  "provider": "mistral",
  "model": "mistral-small-2503",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here is a joke for you: Why did the chicken cross the road? To get to the other side!",
        "tool_calls": [
          {
            "id": "text",
            "type": "text",
            "function": {}
          }
        ],
        "reasoning_content": "text"
      },
      "finish_reason": "text",
      "logprobs": {}
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200
  },
  "prompt_logprobs": null
}

PreviousAPI Overview NextEmbeddings

Last updated 2 months ago