Skip to main content

Text to Speech Endpoint

Overview

The Text-to-Speech endpoint enables you to generate audio by providing a text input along with pre trained voices.

Sample Generation


Example 1

Prompt

In the ancient land of Eldoria, where the skies were painted with shades of mystic hues and the forests whispered secrets of old, there existed a dragon named Zephyros. Unlike the fearsome tales of dragons that plagued human hearts with terror, Zephyros was a creature of wonder and wisdom, revered by all who knew of his existence.


Generated Output


Request

--request POST 'https://modelslab.com/api/v6/voice/text_to_speech' \

Make a POST request to https://modelslab.com/api/v6/voice/text_to_speech endpoint and pass the required parameters as a request body.

Body Attributes

ParameterDescriptionValues
keyThe API key used for authenticating your request.String
promptThe text prompt that describes the audio to be generated.Text
voice_idThe ID of trained voice only. You can get the list of trained voices here.string trained voice ID
languageThe language for the voice. Defaults to English if not specified.American English, British English, Spanish, Japanese, Mandarin Chinese, French, Brazilian Portuguese, Hindi, Italian
speedplayback speed of the generated audio. Defaults to 1.0.Integral value
emotionWhether to enable emotion support. Currently only available in English. Defaults to false.TRUE or FALSE
tempSpecifies if temporary links should be used valid for 24 hours. This can help if access to certain storage sites is blocked. Defaults to false .TRUE or FALSE
webhookA URL where the API will send a POST request once the audio generation is complete.URL
track_idAn ID returned in the API response, used to identify webhook requestsIntegral value

Emotion Support

Emotion support is currently only available in English (en) language. When emotion is enabled, you can use special tags in your text prompt to add expressive elements to the generated speech.

Available Emotion-Supported Voices

Female Voices

  • Tara
  • Leah
  • Jess
  • Mia
  • Zoe

Male Voices

  • Leo
  • Dan
  • Zac

Supported Emotion Tags

The following emotion tags can be added to speech prompts to enhance expressiveness:

TagDescription
<laugh>Adds a laughing effect
<chuckle>A soft chuckle for a subtle humorous tone
<sigh>Expresses disappointment, relief, or tiredness
<cough>Simulates a short cough
<sniffle>Mimics a sniffle, indicating sadness or a cold
<groan>Adds a groaning effect for frustration or discomfort
<yawn>Simulates yawning to express boredom or tiredness
<gasp>Expresses shock or surprise

Example

Body

Body
{   
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
}

Request

var myHeaders = new Headers();
myHeaders.append("Content-Type", "application/json");

var raw = JSON.stringify({
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
});

var requestOptions = {
method: 'POST',
headers: myHeaders,
body: raw,
redirect: 'follow'
};

fetch("https://modelslab.com/api/v6/voice/text_to_speech", requestOptions)
.then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));

Response

{
"status": "success",
"generationTime": 1.904285192489624,
"id": 334166,
"output": [
"https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"proxy_links": [
"https://cdn2.stablediffusionapi.com/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"meta": {
"base64": "no",
"emotion": "Neutral",
"filename": "b2dff60e-4636-4178-9a72-04a10a309185.wav",
"input_sound_clip": [
"tmp/0-b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"input_text": "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"language": "english",
"speed": 1,
"temp": "no"
}
}