Skip to main content

Text to Audio Endpoint

Overview

The Text-to-Audio endpoint enables you to generate audio by providing a text input along with a valid audio URL or a pre-created voice using a voice_id. The output is an audio file that mimics the sound of the provided audio URL or the selected voice.

Open in Playground 🚀

Sample Generation


Example 1

Prompt

In the ancient land of Eldoria, where the skies were painted with shades of mystic hues and the forests whispered secrets of old, there existed a dragon named Zephyros. Unlike the fearsome tales of dragons that plagued human hearts with terror, Zephyros was a creature of wonder and wisdom, revered by all who knew of his existence.


Generated Output


Request

--request POST 'https://modelslab.com/api/v6/voice/text_to_audio' \

Watch the Text to Audio API Demo video to see it in action Postman.

Make a POST request to https://modelslab.com/api/v6/voice/text_to_audio endpoint and pass the required parameters as a request body.

Body Attributes

ParameterDescriptionValues
keyThe API key used for authenticating your request.String
promptThe text prompt that describes the audio to be generated.Text
init_audioA valid URL pointing to the audio file for voice cloning. The file should be 4 to 30 seconds long.MP3/WAV URL
voice_id(Optional) The ID of a voice from the available list. If provided,the audio will be generated using this voice.See list of voices
languageThe language for the voice. Defaults to English if not specified.english, arabic, spanish, brazilian portugues, german, czech, chinese, dutch, french, hindi, hungarian, italian, japanese, korean, polish, russian, turkish
speedplayback speed of the generated audio. Defaults to 1.0.Integral value
base64Indicates whether the input audio file is provided in base64 format. Defaults to "false".TRUE or FALSE
tempSpecifies if temporary links should be used valid for 24 hours. This can help if access to certain storage sites is blocked. Defaults to "false" .TRUE or FALSE
streamOptional. Use this if you want to stream response. Response is returned in base64Boolean true or false
webhookA URL where the API will send a POST request once the audio generation is complete.URL
track_idAn ID returned in the API response, used to identify webhook requestsIntegral value

Open in Playground 🚀

Note: You can either pass init_audio or voice_id. However, if both are passed at the same time the init_audio takes preference.


Language-Specific Guidelines

When using the Text-to-Audio API for Hindi language generation, follow these best practices for accurate and natural-sounding output.

Best Practices for Hindi Language

Voice Cloning
  • Use clear and well-structured Hindi sentences.
  • Ensure proper punctuation for better clarity.
Handling Numbers and Dates
  • Write numbers in Hindi text format instead of digits.
  • ✅Correct: "दो हज़ार पच्चीस"
  • ❌Incorrect: 2025
  • ✅Correct: "पंद्रह अगस्त उन्नीस सौ सैंतालीस"
  • ❌Incorrect: 15/08/1947
  • Always use expanded Hindi form for dates and numbers.
Handling Abbreviations
  • Convert English abbreviations to Hindi phonetic spelling.
  • ✅Correct: आईआईटी (for IIT)
  • ✅Correct: यूएसए (for USA)
  • ✅Correct: एमएल (for ML)
  • Spell out abbreviations in Hindi script to improve pronunciation.

Example

Body

Body
{   
"key": "",
"prompt":"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"init_audio":"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
"language":"english",
"webhook": null,
"track_id": null
}

Request

var myHeaders = new Headers();
myHeaders.append("Content-Type", "application/json");

var raw = JSON.stringify({
"key": "",
"prompt":"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"init_audio":"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
"language":"english",
"webhook": null,
"track_id": null
});

var requestOptions = {
method: 'POST',
headers: myHeaders,
body: raw,
redirect: 'follow'
};

fetch("https://modelslab.com/api/v6/voice/text_to_audio", requestOptions)
.then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));

Response

{
"status": "success",
"generationTime": 1.904285192489624,
"id": 334166,
"output": [
"https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"proxy_links": [
"https://cdn2.stablediffusionapi.com/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"meta": {
"base64": "no",
"emotion": "Neutral",
"filename": "b2dff60e-4636-4178-9a72-04a10a309185.wav",
"input_sound_clip": [
"tmp/0-b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"input_text": "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"language": "english",
"speed": 1,
"temp": "no"
}
}