Chat Completions

Reasoning Models

Configure chain-of-thought reasoning with reasoning_effort and the reasoning parameter.

Some models on DEVUP AI support extended chain-of-thought reasoning — the model “thinks through” a problem step by step before producing a final answer. By default, reasoning models produce a reasoning trace alongside the response. You can control this behavior with the reasoning_effort parameter.

Supported models

Reasoning is available on models that support chain-of-thought, including:

deepseek-ai/DeepSeek-R1

Check the model catalog for the latest list.

Controlling reasoning effort

Use reasoning_effort to control how much reasoning the model performs. Higher effort means deeper thinking but more output tokens and higher latency.

from openai import OpenAI

client = OpenAI(
    api_key="$DEVUP_API_KEY",
    base_url="https://api.devupai.com/v1",
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational."}],
    extra_body={"reasoning_effort": "high"},
)

print(response.choices[0].message.content)

Disabling reasoning

Set reasoning_effort to "none" to disable chain-of-thought entirely. The model will respond directly without a reasoning trace — faster and cheaper.

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    extra_body={"reasoning_effort": "none"},
)

The reasoning parameter

For more granular control, use the reasoning object instead of reasoning_effort:

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[{"role": "user", "content": "Solve this step by step: 15! / 13!"}],
    extra_body={
        "reasoning": {
            "effort": "medium",
            "enabled": True,
        }
    },
)

Setting "enabled": false is equivalent to reasoning_effort: "none".

When to use reasoning

Use case	Effort Level
Math, logic, and code problems	`"high"` (default for reasoning models)
Multi-step analysis	`"medium"` or `"high"`
Simple Q&A, translation, summarization	`"none"`
Cost-sensitive workloads	`"none"` or `"low"`

Supported parameters

Parameter	Type	Description
reasoning_effort	string	Controls reasoning depth: `"none"`, `"low"`, `"medium"`, `"high"`.
reasoning	object	Fine-grained reasoning config.
reasoning.effort	string	Same values as `reasoning_effort`.
reasoning.enabled	boolean	Explicitly enable or disable reasoning.

Notes

Streaming Support: Reasoning models support streaming; the reasoning trace is returned in the reasoning_content delta block before the final answer begins.
Token Pricing: Tokens generated during the reasoning phase are billed as standard output tokens.
Usage Telemetry: Your API response will report completion_tokens_details which contains reasoning_tokens so you can track how much the model "thought".
Temperature Constraints: Some reasoning models may enforce fixed temperatures to maintain logical integrity, ignoring any custom temperature parameter passed.

Prompt Caching

Log Probabilities