Text to Audio Endpoint

Overview

The Text-to-Audio endpoint enables you to generate audio by providing a text input along with a valid audio URL or a pre-created voice using a voice_id. The output is an audio file that mimics the sound of the provided audio URL or the selected voice.

Open in Playground 🚀

Sample Generation

Example 1

Init Audio Clip

Prompt

In the ancient land of Eldoria, where the skies were painted with shades of mystic hues and the forests whispered secrets of old, there existed a dragon named Zephyros. Unlike the fearsome tales of dragons that plagued human hearts with terror, Zephyros was a creature of wonder and wisdom, revered by all who knew of his existence.

Generated Speech

Request

--request POST 'https://modelslab.com/api/v6/voice/text_to_audio' \

Watch the Text to Audio API Demo video to see it in action Postman.

Make a POST request to https://modelslab.com/api/v6/voice/text_to_audio endpoint and pass the required parameters as a request body.

Body Attributes

Parameter	Description	Values
key	The API key used for authenticating your request.	String
prompt	The text prompt that describes the audio to be generated.	Text
init_audio	A valid URL pointing to the audio file for voice cloning. The file should be 4 to 30 seconds long.	MP3/WAV URL
voice_id	(Optional) The ID of a voice from the available list. If provided,the audio will be generated using this voice.	See list of voices
language	The language for the voice. Defaults to English if not specified.	`english`, `arabic`, `spanish`, `brazilian portuguese`, `german`, `czech`, `chinese`, `dutch`, `french`, `hindi`, `hungarian`, `italian`, `japanese`, `korean`, `polish`, `russian`, `turkish`
speed	playback speed of the generated audio. `Defaults` to `1.0`.	Integral value
base64	Indicates whether the input audio file is provided in base64 format. Defaults to "false".	TRUE or FALSE
temp	Specifies if temporary links should be used valid for 24 hours. This can help if access to certain storage sites is blocked. Defaults to "false" .	TRUE or FALSE
stream	Optional. Use this if you want to stream response. Response is returned in base64	Boolean `true` or `false`
webhook	A URL where the API will send a POST request once the audio generation is complete.	URL
track_id	An ID returned in the API response, used to identify webhook requests	Integral value

Open in Playground 🚀

Note: You can either pass init_audio or voice_id. However, if both are passed at the same time the init_audio takes preference.

Language-Specific Guidelines

When using the Text-to-Audio API for Hindi language generation, follow these best practices for accurate and natural-sounding output.

Best Practices for Hindi Language

Voice Cloning

Use clear and well-structured Hindi sentences.
Ensure proper punctuation for better clarity.

Handling Numbers and Dates

Write numbers in Hindi text format instead of digits.
✅Correct: "दो हज़ार पच्चीस"
❌Incorrect: 2025
✅Correct: "पंद्रह अगस्त उन्नीस सौ सैंतालीस"
❌Incorrect: 15/08/1947
Always use expanded Hindi form for dates and numbers.

Handling Abbreviations

Convert English abbreviations to Hindi phonetic spelling.
✅Correct: आईआईटी (for IIT)
✅Correct: यूएसए (for USA)
✅Correct: एमएल (for ML)
Spell out abbreviations in Hindi script to improve pronunciation.

Example

Body

Body
{   
 "key": "",
 "prompt":"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
 "init_audio":"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
 "language":"english",
 "webhook": null,
 "track_id": null
}

Request

JS
PHP
NODE
PYTHON
JAVA

var myHeaders = new Headers();
myHeaders.append("Content-Type", "application/json");

var raw = JSON.stringify({
  "key": "",
  "prompt":"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
  "init_audio":"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
  "language":"english",
  "webhook": null,
  "track_id": null
});

var requestOptions = {
  method: 'POST',
  headers: myHeaders,
  body: raw,
  redirect: 'follow'
};

fetch("https://modelslab.com/api/v6/voice/text_to_audio", requestOptions)
  .then(response => response.text())
  .then(result => console.log(result))
  .catch(error => console.log('error', error));

<?php

$payload = [
  "key" => "",
  "prompt" => "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
  "init_audio" => "https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
  "language" => "english",
  "webhook" => null, 
  "track_id" => null 
];

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => 'https://modelslab.com/api/v6/voice/text_to_audio',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 0,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => 'POST',
  CURLOPT_POSTFIELDS => json_encode($payload),
  CURLOPT_HTTPHEADER => array(
    'Content-Type: application/json'
  ),
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;

var request = require('request');
var options = {
  'method': 'POST',
  'url': 'https://modelslab.com/api/v6/voice/text_to_audio',
  'headers': {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    "key": "",
    "prompt":"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
    "init_audio":"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
    "language":"english",
    "webhook": null,
    "track_id": null
  })
};

request(options, function (error, response) {
  if (error) throw new Error(error);
  console.log(response.body);
});

import requests
import json

url = "https://modelslab.com/api/v6/voice/text_to_audio"

payload = json.dumps({
  "key": "",
  "prompt":"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
  "init_audio":"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
  "language":"english",
  "webhook": None,
  "track_id": None
})

headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

OkHttpClient client = new OkHttpClient().newBuilder()
  .build();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\n    \"key\":\"\",\n    \"prompt\":\"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.\",\n    \"init_audio\":\"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav\",\n    \"language\":\"english\"\n}");
Request request = new Request.Builder()
  .url("https://modelslab.com/api/v6/voice/text_to_audio")
  .method("POST", body)
  .addHeader("Content-Type", "application/json")
  .build();
Response response = client.newCall(request).execute();

Response

Success
Processing
Error

{
    "status": "success",
    "generationTime": 1.904285192489624,
    "id": 334166,
    "output": [
        "https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
    ],
    "proxy_links": [
        "https://cdn2.stablediffusionapi.com/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
    ],
    "meta": {
        "base64": "no",
        "emotion": "Neutral",
        "filename": "b2dff60e-4636-4178-9a72-04a10a309185.wav",
        "input_sound_clip": [
            "tmp/0-b2dff60e-4636-4178-9a72-04a10a309185.wav"
        ],
        "input_text": "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
        "language": "english",
        "speed": 1,
        "temp": "no"
    }
}

{
    "status": "processing",
    "tip": "Your audio is processing in background, you can get this audio using fetch API",
    "eta": 100,
    "message": "Try to fetch request after seconds estimated",
    "fetch_result": "https://modelslab.com/api/v6/voice/fetch/334166",
    "id": 334166,
    "output": [],
    "future_links": [
        "https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
    ],
    "proxy_links": [
        "https://cdn2.stablediffusionapi.com/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
    ],
    "meta": {
        "base64": "no",
        "emotion": "Neutral",
        "filename": "b2dff60e-4636-4178-9a72-04a10a309185.wav",
        "input_sound_clip": [
            "tmp/0-b2dff60e-4636-4178-9a72-04a10a309185.wav"
        ],
        "input_text": "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
        "language": "english",
        "speed": 1,
        "temp": "no"
    }
}

{
    "status": "error",
    "message": "Error message"
}

Overview​

Sample Generation​

Example 1​

Init Audio Clip​

Prompt​

Generated Speech​

Request​

Watch the Text to Audio API Demo video to see it in action Postman.​

Body Attributes​

Open in Playground 🚀​

Language-Specific Guidelines​

Best Practices for Hindi Language​

Example​

Body​

Request​

Response​

Overview

Sample Generation

Example 1

Init Audio Clip

Prompt

Generated Speech

Request

Watch the Text to Audio API Demo video to see it in action Postman.

Body Attributes

Open in Playground 🚀

Language-Specific Guidelines

Best Practices for Hindi Language

Example

Body

Request

Response