Reasoning (Beta)
The handling of reasoning content in responses is currently in beta. For any issues or feedback, please contact us via Discord or [email protected].
Reasoning allows language models to perform deeper, structured thinking before producing a final answer. How this reasoning is exposed — or whether it appears at all — depends entirely on the model provider.
Some models reveal part of their thinking process, while others keep it hidden but still use it internally.
What Is Reasoning?
Reasoning refers to the model’s deeper analytical process: evaluating options, forming intermediate steps, and then producing a final answer.
Note: Depending on the model provider, this reasoning may appear in various formats:
mixed into the normal
contentin a dedicated
reasoning_contentfieldinside structured “thinking” chunks (e.g., Mistral reasoning models)
Other models keep their chain-of-thought hidden but still support configurable reasoning behavior.
Controlling Reasoning
To provide a consistent experience across providers that support it, Cortecs accepts:
"reasoning_effort": "low" | "medium" | "high"This parameter represents how much reasoning effort you want the model to use.
If the provider supports configurable reasoning, Cortecs translates the value appropriately.
If the provider does not support adjustable reasoning, the parameter is simply ignored.
If the provider uses reasoning by default, the parameter may still help increase or reduce thinking depth.
When to choose which level?
Use low if you want fast responses, low cost, or the task is simple.
Use medium for general use: coding, explanations, multi-step tasks.
Use high for reasoning-intensive tasks: debugging, strategy, multi-constraint planning, mathematical reasoning, or anything requiring precision.
Provider Behavior
Different model families use different mechanisms for reasoning. Below is how Cortecs handles reasoning_effort for each provider.
Anthropic
Anthropic uses a reasoning budget, which determines how much internal thinking the model can perform.
Cortecs automatically converts the user’s reasoning_effort input into the appropriate budget value:
low
1024
medium
8192
high
16348
Custom Budget (Anthropic)
Users may also override the automatic mapping by providing a custom numeric string, for example: "reasoning_effort": "2000"
Rules:
The minimum allowed budget is 1024.
If the user provides a value below 1024, Cortecs automatically raises it to 1024.
If the value is unsupported or invalid when using
reasoning_effortas a numeric string, the default of 1024 is used.
Azure OpenAI
Azure OpenAI follows the same general behavior as OpenAI:
Azure OpenAI does not expose raw reasoning tokens, so the internal chain of thought is never shown.
For newer reasoning-capable models (such as GPT-5), you can still use
reasoning_effortto control the depth of reasoning; older ones ignore it.
Google Gemini (2.5 and later)
Reasoning is enabled by default.
Provided efforts are converted into a reasoning budget similar to Anthropic.
Custom Budget (Gemini)
Users may specify a custom numeric budget string similar to Anthropic, for example: "reasoning_effort": "2000"
Rules:
Gemini models have different upper limits (see Google documentation):
For
gemini-2.5-flash: 1 – 24,576For
gemini-2.5-pro: 128 – 32,768
If a user sets a budget outside these limits, Cortecs automatically adjusts it to the nearest supported value.
If the value is unsupported or invalid when using
reasoning_effortas a numeric string, the default of 1024 is used.
Mistral Models
Do not support
reasoning_effort.If a Mistral model has reasoning capability, it returns it automatically.
Reasoning appears in “thinking chunks”
Reasoning behavior varies across models, and not all reasoning steps may be visible in the response. Using reasoning_effort lets you request deeper or lighter reasoning when supported, while Cortecs automatically handles internal budgets where applicable. Keep in mind that some models expose reasoning explicitly, others hide it, and some include it by default.
Last updated