DevUp Docs
Back to Dashboard

More APIs

Speech Recognition

Transcribe audio to text using Whisper and other speech recognition models.

DEVUP AI hosts Whisper and other speech recognition models. Given an audio file, they produce transcribed text with per-sentence timestamps.

Browse all speech recognition models.

Models

  • openai/whisper-large — best accuracy
  • openai/whisper-medium, openai/whisper-small, openai/whisper-base — faster, lighter
  • openai/whisper-timestamped-medium — per-word timestamp segmentation

Example

curl -X POST \
  -H "Authorization: Bearer $DEVUP_API_KEY" \
  -F audio=@audio.mp3 \
  'https://api.devupai.com/v1/inference/openai/whisper-large'

Supported audio formats

  • mp3
  • wav

Response

json
{
  "text": "Hello, this is a transcription of the audio file.",
  "segments": [
    {
      "start": 0.0,
      "end": 3.5,
      "text": "Hello, this is a transcription of the audio file."
    }
  ]
}

Additional parameters

Each model exposes different parameters (language, task, etc.). Check the model's API documentation page for details.

Tutorial

See the Whisper tutorial for a complete walkthrough.