Chat Completions
DEVUP AI offers an OpenAI-compatible chat completions API for all LLM models at the best prices. Whether you are using Llama 3, DeepSeek, or Qwen, our gateway strictly follows the OpenAI schema.
https://api.devupai.com/v1/chat/completionsInstall the SDK
pip install openaiBasic chat completion
By setting the baseURL and passing your DEVUP AI token, you can route requests directly to our optimized infrastructure.
from openai import OpenAI
openai = OpenAI(
api_key="$DEVUP_API_KEY",
base_url="https://api.devupai.com/v1",
)
chat_completion = openai.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[{"role": "user", "content": "Hello!"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)Multi-turn conversations
To create a longer conversation, include the full message history in every request. The models themselves do not have memory. To maintain conversational state, you must pass the entire conversation history in the messages array on each request.
from openai import OpenAI
openai = OpenAI(
api_key="$DEVUP_API_KEY",
base_url="https://api.devupai.com/v1",
)
chat_completion = openai.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[
{"role": "system", "content": "Respond like a michelin starred chef."},
{"role": "user", "content": "Can you name at least two different techniques to cook lamb?"},
{"role": "assistant", "content": "Bonjour! Let me tell you, my friend, cooking lamb is an art form..."},
{"role": "user", "content": "Tell me more about the second method."},
],
)
print(chat_completion.choices[0].message.content)Supported parameters
We support the majority of standard OpenAI chat completion parameters.
tool_choice
We may not be 100% compatible with all OpenAI parameters. Let us know on support if something is missing.
Service tier
DEVUP AI offers an optional priority service tier that guarantees the highest quality of service by routing your inference requests to our fastest, dedicated GPU clusters.
⚠ Priority inference incurs a 20% surcharge on top of the model's standard per-token price.
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={"service_tier": "priority"},
)Max output tokens
Due to hardware limits, there is a hard cap on how many output tokens models can generate in a single request. For models like DeepSeek V3, this is usually 8192 tokens.
Continuing responses beyond the limit
If a model stops generating because it reached the maximum output tokens (the finish_reason will be length), you can simply send the truncated response back as an assistant message and the model will continue right where it left off.
curl -X POST https://api.devupai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEVUP_API_KEY" \
-d '{
"model": "deepseek-ai/DeepSeek-V3",
"messages": [
{
"role": "user",
"content": "Write a 5,000 word essay on quantum physics."
},
{
"role": "assistant",
"content": "<previous truncated response>"
}
]
}'