> ## Documentation Index
> Fetch the complete documentation index at: https://docs.modelslab.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech To Text

> This endpoint allows you to convert speech in audio files to text.

## Request

Make a `POST` request to below endpoint and pass the required parameters as a request body.

```curl curl theme={null}
--request POST 'https://modelslab.com/api/v1/enterprise/voice/speech_to_text' \
```

## Body

```json json theme={null}
{
    "key": "enterprise_api_key",
    "init_audio": "https://assets.modelslab.ai/generations/5c3eef10-0eb4-4db8-8b12-fc4eedbf30b9.mp3",
    "language": "en",
    "timestamp_level": null,
    "webhook": null,
    "track_id": null
}
```

## Body Attributes

<ParamField query="key" type="string" required>
  The API key required to authorize the request.
</ParamField>

<ParamField query="init_audio" type="string" format="url" required>
  The URL of the audio file to be transcribed.\
  Supported formats: WAV, MP3, FLAC, OPUS.\
  Duration limits: minimum 5 seconds, maximum 1 hour.
</ParamField>

<ParamField query="language" type="string" enum="[&#x22;en&#x22;, &#x22;es&#x22;, &#x22;fr&#x22;]" default="en">
  The language code of the audio content in ISO 639-1 format. Examples: en (English), es (Spanish), fr (French).
</ParamField>

<ParamField query="timestamp_level" type="string" enum="[&#x22;word&#x22;, &#x22;sentence&#x22;, &#x22;null&#x22;]">
  The level of detail for timestamps in the transcription. Options: word, sentence, or null (no timestamps). Default: null.
</ParamField>

<Warning>
  **Timestamp Level Accuracy:** Sentence-level timestamps work well and provide reliable results. However, word-level timestamps may not be accurate and may provide less reliable results.
</Warning>

<ParamField query="webhook" type="string" format="url">
  A URL to receive a POST request once the transcription is complete.
</ParamField>

<ParamField query="track_id" type="integer">
  An ID included in the webhook response to identify the request.
</ParamField>

### Languages Supported

<Info>
  Whisper supports several languages, but performance may vary due to factors like limited training data, script complexity, and regional dialects, potentially affecting transcription accuracy.
</Info>

```
"Afrikaans": "af",
"Arabic": "ar",
"Belarusian": "be",
"Bengali": "bn",
"Bulgarian": "bg",
"Chinese": "zh",
"Czech": "cs",
"Danish": "da",
"Dutch": "nl",
"English": "en",
"Finnish": "fi",
"French": "fr",
"German": "de",
"Greek": "el",
"Hebrew": "he",
"Hindi": "hi",
"Hungarian": "hu",
"Indonesian": "id",
"Italian": "it",
"Japanese": "ja",
"Kannada": "kn",
"Korean": "ko",
"Malayalam": "ml",
"Marathi": "mr",
"Nepali": "ne",
"Panjabi": "pa",
"Persian": "fa",
"Polish": "pl",
"Portuguese": "pt",
"Romanian": "ro",
"Russian": "ru",
"Serbian": "sr",
"Spanish": "es",
"Swedish": "sv",
"Tagalog": "tl",
"Tamil": "ta",
"Telugu": "te",
"Thai": "th",
"Turkish": "tr",
"Ukrainian": "uk",
"Urdu": "ur",
"Vietnamese": "vi",
"Welsh": "cy"
```

<Note>
  **Performance may vary due to factors like script complexity, and regional dialects, which may affect transcription accuracy.**
</Note>
