Docs
cortecs.aiDedicated ModelsServerless ModelsLogin
  • Introduction
  • DEDICATED INFERENCE
    • Quickstart
    • Provisioning API
      • Authentication
      • User
      • Instances
      • Models
      • Hardware Types
    • Python client
      • Objects
      • Langchain integration
    • Examples
      • Batch jobs
      • Realtime streams
  • SERVERLESS INFERENCE
    • Quickstart
    • About Serverless Routing
    • API
      • Chat Completions
      • Models
  • Discord
Powered by GitBook
On this page
  1. SERVERLESS INFERENCE
  2. API

Chat Completions

PreviousAPINextModels

Last updated 8 days ago

Create a chat completion

post

This endpoint creates a chat completion using the specified model.

Body

A request object for generating chat completions. This object contains all necessary parameters to generate a response from the specified model. Many of the parameters are optional and it is recommended to set them only if they are needed. Not all providers support all parameters, therefore including more parameters limits the number of providers that can be used to generate the completion.

modelstringOptional

The model to use for the completion.

Example: mistral-small-2503
preferencestring ยท enumOptional

The provider preference for handling the request.

Default: balancedPossible values:
temperaturenumber | nullableOptional

Controls randomness in the output. Higher values make the output more random.

Default: 0.7
max_tokensinteger | nullableOptional

The maximum number of tokens to generate in the completion.

Default: 512
top_pnumber | nullableOptional

Controls diversity via nucleus sampling - only tokens whose cumulative probability mass exceeds top_p are considered for sampling. For example, 0.1 means only tokens comprising the top 10% probability mass are considered. An alternative to temperature sampling - we recommend altering either top_p or temperature, but not both.

Default: 1
frequency_penaltynumber | nullableOptional

Reduces the probability of generating a token based on its frequency in the text so far. The more times a token has appeared in the text so far, the lower the probability of it appearing in the completion.

Default: 0
presence_penaltynumber | nullableOptional

Reduces the probability of generating a token based on whether it has already appeared in the text so far. If a token has already appeared in the text so far, the probability of it appearing in the completion is reduced.

Default: 0
response_formatobject | nullableOptional

Specifies the format of the response.

Example: {"type":"json_schema","json_schema":{"...":null}}
stopstring[] | nullableOptional

Sequences where the API will stop generating further tokens.

Example: ["\n\n"]
streamboolean | nullableOptional

Whether to stream the response. The last chunk will contain the usage information.

Default: false
logprobsinteger | boolean | nullableOptional

Whether to return log probabilities of the output tokens.

Example: false
seedinteger | nullableOptional

Random seed for reproducible results.

Example: 42
toolsobject[] | nullableOptional

List of tools available to the model.

tool_choicestring | nullableOptional

Controls which tool the model should use. Only set if tools is not empty.

ninteger | nullableOptional

Number of completions to generate.

Example: 1
predictionobject | nullableOptional

Specify expected results, optimizing response times by leveraging known or predictable content. This approach is especially effective for updating text documents or code files with minimal changes, reducing latency while maintaining high-quality results.

Example: {"type":"content","content":""}
parallel_tool_callsbooleanOptional

Whether to allow parallel tool calls.

safe_promptbooleanOptional

Whether to inject a safety prompt before all conversations.

Responses
200
A chat completion.
application/json
500
Internal server error.
post
POST /api/v1/models/serverless/chat/completions HTTP/1.1
Host: cortecs.ai
Content-Type: application/json
Accept: */*
Content-Length: 454

{
  "model": "mistral-small-2503",
  "messages": [
    {
      "role": "user",
      "content": "Tell me a joke."
    }
  ],
  "preference": "balanced",
  "temperature": 0.7,
  "max_tokens": 512,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "...": null
    }
  },
  "stop": [
    "\n\n"
  ],
  "stream": false,
  "logprobs": false,
  "seed": 42,
  "tools": null,
  "tool_choice": null,
  "n": 1,
  "prediction": {
    "type": "content",
    "content": ""
  },
  "parallel_tool_calls": true,
  "safe_prompt": true
}
{
  "object": "chat.completion",
  "id": "cmpl_1234567890",
  "created": 1715155200,
  "provider": "mistral",
  "model": "mistral-small-2503",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here is a joke for you: Why did the chicken cross the road? To get to the other side!",
        "tool_calls": [
          {
            "id": "text",
            "type": "text",
            "function": {}
          }
        ],
        "reasoning_content": "text"
      },
      "finish_reason": "text",
      "logprobs": {}
    }
  ],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 100,
    "total_tokens": 200
  },
  "prompt_logprobs": null
}