Reasoning (Beta)

Reasoning allows language models to perform deeper, structured thinking before producing a final answer. How this reasoning is exposed — or whether it appears at all — depends entirely on the model provider.

Some models reveal part of their thinking process, while others keep it hidden but still use it internally.

What Is Reasoning?

Reasoning refers to the model’s deeper analytical process: evaluating options, forming intermediate steps, and then producing a final answer.

Note: Depending on the model provider, this reasoning may appear in various formats:

  • mixed into the normal content

  • in a dedicated reasoning_content field

  • inside structured “thinking” chunks (e.g., Mistral reasoning models)

Other models keep their chain-of-thought hidden but still support configurable reasoning behavior.

Controlling Reasoning

To provide a consistent experience across providers that support it, Cortecs accepts:

"reasoning_effort": "low" | "medium" | "high"

This parameter represents how much reasoning effort you want the model to use.

  • If the provider supports configurable reasoning, Cortecs translates the value appropriately.

  • If the provider does not support adjustable reasoning, the parameter is simply ignored.

  • If the provider uses reasoning by default, the parameter may still help increase or reduce thinking depth.

When to choose which level?

  • Use low if you want fast responses, low cost, or the task is simple.

  • Use medium for general use: coding, explanations, multi-step tasks.

  • Use high for reasoning-intensive tasks: debugging, strategy, multi-constraint planning, mathematical reasoning, or anything requiring precision.

Provider Behavior

Different model families use different mechanisms for reasoning. Below is how Cortecs handles reasoning_effort for each provider.

Anthropic

Anthropic uses a reasoning budget, which determines how much internal thinking the model can perform. Cortecs automatically converts the user’s reasoning_effort input into the appropriate budget value:

Effort
Budget

low

1024

medium

8192

high

16348

Custom Budget (Anthropic)

Users may also override the automatic mapping by providing a custom numeric string, for example: "reasoning_effort": "2000"

Rules:

  • The minimum allowed budget is 1024.

  • If the user provides a value below 1024, Cortecs automatically raises it to 1024.

  • If the value is unsupported or invalid when using reasoning_effort as a numeric string, the default of 1024 is used.

If your requested reasoning budget (through reasoning_effort) exceeds the model’s maximum output token limit, Cortecs automatically adjusts it to stay within the allowed limits.

Azure OpenAI

Azure OpenAI follows the same general behavior as OpenAI:

  • Azure OpenAI does not expose raw reasoning tokens, so the internal chain of thought is never shown.

  • For newer reasoning-capable models (such as GPT-5), you can still use reasoning_effort to control the depth of reasoning; older ones ignore it.

Google Gemini (2.5 and later)

  • Reasoning is enabled by default.

  • Provided efforts are converted into a reasoning budget similar to Anthropic.

Custom Budget (Gemini)

Users may specify a custom numeric budget string similar to Anthropic, for example: "reasoning_effort": "2000"

Rules:

  • Gemini models have different upper limits (see Google documentation):

    • For gemini-2.5-flash: 1 – 24,576

    • For gemini-2.5-pro: 128 – 32,768

  • If a user sets a budget outside these limits, Cortecs automatically adjusts it to the nearest supported value.

  • If the value is unsupported or invalid when using reasoning_effort as a numeric string, the default of 1024 is used.

Custom numeric budgets such as reasoning_effort: "2000" are supported only by Gemini and Claude models. All other models accept only low, medium, or high.

Mistral Models

  • Do not support reasoning_effort.

  • If a Mistral model has reasoning capability, it returns it automatically.

  • Reasoning appears in “thinking chunks”

Reasoning behavior varies across models, and not all reasoning steps may be visible in the response. Using reasoning_effort lets you request deeper or lighter reasoning when supported, while Cortecs automatically handles internal budgets where applicable. Keep in mind that some models expose reasoning explicitly, others hide it, and some include it by default.

Reasoning token counts are currently included in the completion token count.

Last updated