Request

Make a POST request to below endpoint and pass the required parameters as a request body.
curl
--request POST 'https://modelslab.com/api/v1/enterprise/voice/speech_to_text' \

Body

json
{
    "key": "enterprise_api_key",
    "init_audio": "https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
    "language": "en",
    "timestamp_level": null,
    "webhook": null,
    "track_id": null
}

Body Attributes

key
string
required
The API key required to authorize the request.
init_audio
string
required
The URL of the audio file to be transcribed.
Supported formats: WAV, MP3, FLAC, OPUS.
Duration limits: minimum 5 seconds, maximum 1 hour.
language
string
default:"en"
The language code of the audio content in ISO 639-1 format. Examples: en (English), es (Spanish), fr (French).
timestamp_level
string
The level of detail for timestamps in the transcription. Options: word, sentence, or null (no timestamps). Default: null.
webhook
string
A URL to receive a POST request once the transcription is complete.
track_id
integer
An ID included in the webhook response to identify the request.

Languages Supported

Whisper supports several languages, but performance may vary due to factors like limited training data, script complexity, and regional dialects, potentially affecting transcription accuracy.
"Afrikaans": "af",
"Arabic": "ar",
"Belarusian": "be",
"Bengali": "bn",
"Bulgarian": "bg",
"Chinese": "zh",
"Czech": "cs",
"Danish": "da",
"Dutch": "nl",
"English": "en",
"Finnish": "fi",
"French": "fr",
"German": "de",
"Greek": "el",
"Hebrew": "he",
"Hindi": "hi",
"Hungarian": "hu",
"Indonesian": "id",
"Italian": "it",
"Japanese": "ja",
"Kannada": "kn",
"Korean": "ko",
"Malayalam": "ml",
"Marathi": "mr",
"Nepali": "ne",
"Panjabi": "pa",
"Persian": "fa",
"Polish": "pl",
"Portuguese": "pt",
"Romanian": "ro",
"Russian": "ru",
"Serbian": "sr",
"Spanish": "es",
"Swedish": "sv",
"Tagalog": "tl",
"Tamil": "ta",
"Telugu": "te",
"Thai": "th",
"Turkish": "tr",
"Ukrainian": "uk",
"Urdu": "ur",
"Vietnamese": "vi",
"Welsh": "cy"
Performance may vary due to factors like script complexity, and regional dialects, which may affect transcription accuracy.