Chat Completions
Vision & OCR
Send images to multimodal models for visual understanding and text extraction.
DEVUP AI hosts multimodal models that accept both images and text as input and produce text output. These models use the standard OpenAI vision API format and cover two major use cases:
- Visual understanding — describe images, answer questions about visual content, compare images, analyze charts
- OCR (Optical Character Recognition) — extract text from scanned documents, receipts, invoices, screenshots, handwritten notes, and PDFs
Available vision models
- Qwen/Qwen2.5-VL-32B-Instruct
- Qwen/Qwen2.5-VL-7B-Instruct
- meta-llama/Llama-3.2-11B-Vision-Instruct
Available OCR models
We host a growing set of OCR-specialized models for high-accuracy text extraction. Browse the full OCR model catalog.
OCR models currently use the same vision API format below. A dedicated OCR endpoint optimized for document processing is coming soon.
Quick start
Images are passed in two ways:
- URL — pass a link to a publicly accessible image
- Base64 — encode the image and include it directly in the request
Image URL
curl "https://api.devupai.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEVUP_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-VL-32B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://example.com/sample-image.webp"
}
},
{
"type": "text",
"text": "What is in this image?"
}
]
}
]
}'Base64 encoded image
from openai import OpenAI
import base64
import requests
client = OpenAI(
api_key="$DEVUP_API_KEY",
base_url="https://api.devupai.com/v1",
)
image_url = "https://example.com/sample-image.webp"
base64_image = base64.b64encode(requests.get(image_url).content).decode("utf-8")
chat_completion = client.chat.completions.create(
model="Qwen/Qwen2.5-VL-32B-Instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
},
{
"type": "text",
"text": "What is in this image?"
}
]
}
]
)
print(chat_completion.choices[0].message.content)OCR example
Extract all text from a document image:
from openai import OpenAI
import base64
client = OpenAI(
api_key="$DEVUP_API_KEY",
base_url="https://api.devupai.com/v1",
)
with open("invoice.png", "rb") as f:
base64_image = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="Qwen/Qwen2.5-VL-32B-Instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"}
},
{
"type": "text",
"text": "Extract all text from this document. Preserve the structure and layout as much as possible."
}
]
}
]
)
print(response.choices[0].message.content)Common OCR prompts:
Extract all text from this image.Extract all text from this document. Preserve the structure and layout as much as possible.Convert this image to a markdown table.
Multiple images
You can pass multiple images in a single request by including multiple image_url content items:
curl "https://api.devupai.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEVUP_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-VL-32B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the differences between these two images?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/page1.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/page2.jpg"
}
}
]
}
]
}'Pricing and token counting
Images are automatically rescaled and processed by the vision models into tokens. You are billed based on the total number of tokens generated from both your text prompt and the encoded images.
The exact number of tokens used by your images will be returned in the {"usage": {"prompt_tokens": ...}}} dictionary within the API response.
Limitations
- Supported formats: jpg, png, webp
- Max size: 20MB per image
- The detail parameter is not currently supported