Chat Completions
Last updated
Last updated
This endpoint creates a chat completion using the specified model.
A request object for generating chat completions. This object contains all necessary parameters to generate a response from the specified model. Many of the parameters are optional and it is recommended to set them only if they are needed. Not all providers support all parameters, therefore including more parameters limits the number of providers that can be used to generate the completion.
The model to use for the completion.
mistral-small-2503
The provider preference for handling the request.
balanced
Possible values: Controls randomness in the output. Higher values make the output more random.
0.7
The maximum number of tokens to generate in the completion.
512
Controls diversity via nucleus sampling - only tokens whose cumulative probability mass exceeds top_p are considered for sampling. For example, 0.1 means only tokens comprising the top 10% probability mass are considered. An alternative to temperature sampling - we recommend altering either top_p or temperature, but not both.
1
Reduces the probability of generating a token based on its frequency in the text so far. The more times a token has appeared in the text so far, the lower the probability of it appearing in the completion.
0
Reduces the probability of generating a token based on whether it has already appeared in the text so far. If a token has already appeared in the text so far, the probability of it appearing in the completion is reduced.
0
Specifies the format of the response.
{"type":"json_schema","json_schema":{"...":null}}
Sequences where the API will stop generating further tokens.
["\n\n"]
Whether to stream the response. The last chunk will contain the usage information.
false
Whether to return log probabilities of the output tokens.
false
Random seed for reproducible results.
42
List of tools available to the model.
Controls which tool the model should use. Only set if tools is not empty.
Number of completions to generate.
1
Specify expected results, optimizing response times by leveraging known or predictable content. This approach is especially effective for updating text documents or code files with minimal changes, reducing latency while maintaining high-quality results.
{"type":"content","content":""}
Whether to allow parallel tool calls.
Whether to inject a safety prompt before all conversations.