More APIs

Speech Recognition

Transcribe audio to text using Whisper and other speech recognition models.

DEVUP AI hosts Whisper and other speech recognition models. Given an audio file, they produce transcribed text with per-sentence timestamps.

Browse all speech recognition models.

Models

openai/whisper-large-v3 — best accuracy
openai/whisper-large-v3-turbo — faster, optimized
mistralai/Voxtral-Small-24B-2507 — multilingual, high quality
mistralai/Voxtral-Mini-3B-2507 — lightweight, fast
nvidia/Nemotron-3.5-ASR-Streaming-Multilingual-0.6b — streaming, multilingual

Example

curl -X POST \
  -H "Authorization: Bearer $DEVUP_API_KEY" \
  -F model="openai/whisper-large-v3" \
  -F file=@audio.mp3 \
  'https://api.devupai.com/v1/audio/transcriptions'

Supported audio formats

mp3
wav

Response

json

{
  "text": "Hello, this is a transcription of the audio file.",
  "segments": [
    {
      "start": 0.0,
      "end": 3.5,
      "text": "Hello, this is a transcription of the audio file."
    }
  ]
}

Additional parameters

Each model exposes different parameters (language, task, etc.). Check the model's API documentation page for details.

Tutorial

See the Whisper tutorial for a complete walkthrough.

Text to Video

Text to Speech