Request
Make a POST
request to below endpoint and pass the required parameters as a request body.
--request POST 'https://modelslab.com/api/v6/voice/speech_to_text' \
Body
{
"key": "your_api_key",
"init_audio": "https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
"language": "en",
"timestamp_level": null,
"webhook": null,
"track_id": null
}
Languages Supported
Whisper supports several languages, but performance may vary due to factors like limited training data, script complexity, and regional dialects, potentially affecting transcription accuracy.
"Afrikaans": "af",
"Arabic": "ar",
"Belarusian": "be",
"Bengali": "bn",
"Bulgarian": "bg",
"Chinese": "zh",
"Czech": "cs",
"Danish": "da",
"Dutch": "nl",
"English": "en",
"Finnish": "fi",
"French": "fr",
"German": "de",
"Greek": "el",
"Hebrew": "he",
"Hindi": "hi",
"Hungarian": "hu",
"Indonesian": "id",
"Italian": "it",
"Japanese": "ja",
"Kannada": "kn",
"Korean": "ko",
"Malayalam": "ml",
"Marathi": "mr",
"Nepali": "ne",
"Panjabi": "pa",
"Persian": "fa",
"Polish": "pl",
"Portuguese": "pt",
"Romanian": "ro",
"Russian": "ru",
"Serbian": "sr",
"Spanish": "es",
"Swedish": "sv",
"Tagalog": "tl",
"Tamil": "ta",
"Telugu": "te",
"Thai": "th",
"Turkish": "tr",
"Ukrainian": "uk",
"Urdu": "ur",
"Vietnamese": "vi",
"Welsh": "cy"
API key required to authorize the request
URL of audio file to transcribe. Supported: WAV, MP3, FLAC, OPUS (5 seconds - 1 hour)
Language code in ISO 639-1 format (e.g. 'en', 'es', 'fr')
Level of detail for timestamps in transcription
Available options:
word
,
sentence
URL to receive POST notification upon completion
ID for webhook identification
Status of the voice generation
Available options:
success
,
processing
,
error
Time taken to generate the audio in seconds
Unique identifier for the voice generation
Array of generated audio URLs
Array of proxy audio URLs
Array of future audio URLs for queued requests
Array of audio URLs (voice cover response)
Metadata about the audio generation including all parameters used
Estimated time for completion in seconds (processing status)
Status message or additional information
Additional information or tips for the user
URL to fetch the result when processing
Duration of the generated audio in seconds