Skip to main content
POST
/
enterprise
/
speech_to_text
/
transcribe
Enterprise: Speech to Text Transcription Endpoint
curl --request POST \
  --url https://modelslab.com/api/v1/enterprise/speech_to_text/transcribe \
  --header 'Content-Type: application/json' \
  --data '
{
  "key": "<string>",
  "init_audio": "<string>",
  "language": "en",
  "timestamp_level": "null",
  "webhook": "<string>",
  "track_id": 123
}
'
{
  "status": "success",
  "message": "Operation completed successfully"
}

Request

Make a POST request to below endpoint and pass the required parameters as a request body.
curl
--request POST 'https://modelslab.com/api/v1/enterprise/speech_to_text/transcribe' \

Body

json
{
  "key": "enterprise_api_key",
  "init_audio": "https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
  "language": "en",
  "timestamp_level": null,
  "webhook": null,
  "track_id": null
}

Body Attributes

key
string
required
API key for authentication.
init_audio
string
required
The audio file to transcribe.
language
string
default:"en"
Language for the voice.
Allowed values: af, ar, be, bn, bg, zh, cs, da, nl, en, fi, fr, de, el, he, hi, hu, id, it, ja, kn, ko, ml, mr, ne, pa, fa, pl, pt, ro, ru, sr, es, sv, tl, ta, te, th, tr, uk, ur, vi, cy
timestamp_level
default:"null"
Timestamp level for the transcription.
Allowed values: null, word, sentence
webhook
string
URL to receive POST notification upon completion.
track_id
integer
ID for webhook identification.
Timestamp Level Accuracy: Sentence-level timestamps work well and provide reliable results. However, word-level timestamps may not be accurate and may provide less reliable results.
Whisper supports several languages, but performance may vary due to factors like limited training data, script complexity, and regional dialects, potentially affecting transcription accuracy.

Languages Supported

"Afrikaans": "af",
"Arabic": "ar",
"Belarusian": "be",
"Bengali": "bn",
"Bulgarian": "bg",
"Chinese": "zh",
"Czech": "cs",
"Danish": "da",
"Dutch": "nl",
"English": "en",
"Finnish": "fi",
"French": "fr",
"German": "de",
"Greek": "el",
"Hebrew": "he",
"Hindi": "hi",
"Hungarian": "hu",
"Indonesian": "id",
"Italian": "it",
"Japanese": "ja",
"Kannada": "kn",
"Korean": "ko",
"Malayalam": "ml",
"Marathi": "mr",
"Nepali": "ne",
"Panjabi": "pa",
"Persian": "fa",
"Polish": "pl",
"Portuguese": "pt",
"Romanian": "ro",
"Russian": "ru",
"Serbian": "sr",
"Spanish": "es",
"Swedish": "sv",
"Tagalog": "tl",
"Tamil": "ta",
"Telugu": "te",
"Thai": "th",
"Turkish": "tr",
"Ukrainian": "uk",
"Urdu": "ur",
"Vietnamese": "vi",
"Welsh": "cy"

Body

application/json
key
string
required

Your API key

init_audio
string
required

The audio file to transcribe

language
string
default:en

Language for the transcription

timestamp_level
enum<string>
default:null

Timestamp level for the transcription

Available options:
null,
word,
sentence
webhook
string

URL to receive POST notification upon completion

track_id
integer

ID for webhook identification

Response

Success

status
string
Example:

"success"

message
string
Example:

"Operation completed successfully"