Speech To Text

Convert speech to text

curl --request POST \
  --url https://modelslab.com/api/v6/voice/speech_to_text \
  --header 'Content-Type: application/json' \
  --data '
{
  "key": "<string>",
  "init_audio": "<string>",
  "language": "<string>",
  "timestamp_level": "word",
  "webhook": "<string>",
  "track_id": 123
}
'

{
  "status": "success",
  "generationTime": 123,
  "id": 123,
  "output": [
    "<string>"
  ],
  "proxy_links": [
    "<string>"
  ],
  "future_links": [
    "<string>"
  ],
  "links": [
    "<string>"
  ],
  "meta": {},
  "eta": 123,
  "message": "<string>",
  "tip": "<string>",
  "fetch_result": "<string>",
  "audio_time": 123
}

POST

voice

speech_to_text

Convert speech to text

curl --request POST \
  --url https://modelslab.com/api/v6/voice/speech_to_text \
  --header 'Content-Type: application/json' \
  --data '
{
  "key": "<string>",
  "init_audio": "<string>",
  "language": "<string>",
  "timestamp_level": "word",
  "webhook": "<string>",
  "track_id": 123
}
'

{
  "status": "success",
  "generationTime": 123,
  "id": 123,
  "output": [
    "<string>"
  ],
  "proxy_links": [
    "<string>"
  ],
  "future_links": [
    "<string>"
  ],
  "links": [
    "<string>"
  ],
  "meta": {},
  "eta": 123,
  "message": "<string>",
  "tip": "<string>",
  "fetch_result": "<string>",
  "audio_time": 123
}

Request

Make a POST request to below endpoint and pass the required parameters as a request body.

curl

--request POST 'https://modelslab.com/api/v6/voice/speech_to_text' \

Body

json

{    
  "key": "your_api_key",    
  "init_audio": "https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",    
  "language": "en",    
  "timestamp_level": null,    
  "webhook": null,    
  "track_id": null
}

Timestamp Level Accuracy: Sentence-level timestamps work well and provide reliable results. However, word-level timestamps may not be accurate and may provide less reliable results.

Languages Supported

Whisper supports several languages, but performance may vary due to factors like limited training data, script complexity, and regional dialects, potentially affecting transcription accuracy.

"Afrikaans": "af",
"Arabic": "ar",
"Belarusian": "be",
"Bengali": "bn",
"Bulgarian": "bg",
"Chinese": "zh",
"Czech": "cs",
"Danish": "da",
"Dutch": "nl",
"English": "en",
"Finnish": "fi",
"French": "fr",
"German": "de",
"Greek": "el",
"Hebrew": "he",
"Hindi": "hi",
"Hungarian": "hu",
"Indonesian": "id",
"Italian": "it",
"Japanese": "ja",
"Kannada": "kn",
"Korean": "ko",
"Malayalam": "ml",
"Marathi": "mr",
"Nepali": "ne",
"Panjabi": "pa",
"Persian": "fa",
"Polish": "pl",
"Portuguese": "pt",
"Romanian": "ro",
"Russian": "ru",
"Serbian": "sr",
"Spanish": "es",
"Swedish": "sv",
"Tagalog": "tl",
"Tamil": "ta",
"Telugu": "te",
"Thai": "th",
"Turkish": "tr",
"Ukrainian": "uk",
"Urdu": "ur",
"Vietnamese": "vi",
"Welsh": "cy"

Body

application/json

key

string

required

API key required to authorize the request

init_audio

string<uri>

required

URL of audio file to transcribe. Supported: WAV, MP3, FLAC, OPUS (5 seconds - 1 hour)

language

string

required

Language code in ISO 639-1 format (e.g. 'en', 'es', 'fr')

timestamp_level

enum<string>

Level of detail for timestamps in transcription. Sentence-level timestamps work well and provide reliable results. However, word-level timestamps may not be accurate and may provide less reliable results.

Available options:

word,

sentence,

null

webhook

string<uri>

URL to receive POST notification upon completion

track_id

integer

ID for webhook identification

Response

Speech to text response

status

enum<string>

Status of the voice generation

Available options:

success,

processing,

error

generationTime

number

Time taken to generate the audio in seconds

integer

Unique identifier for the voice generation

output

string<uri>[]

Array of generated audio URLs

proxy_links

string<uri>[]

Array of proxy audio URLs

future_links

string<uri>[]

Array of future audio URLs for queued requests

links

string<uri>[]

Array of audio URLs (voice cover response)

Using the APIs

Our AI APIs

Request

Body

Languages Supported

Body

Response

Using the APIs

Our AI APIs

​Request

​Body

​Languages Supported

Body

Response

Request

Body

Languages Supported