Text to Speech Endpoint
Overview
The Text-to-Speech endpoint enables you to generate audio by providing a text input along with pre trained voices.
Sample Generation
Example 1
Prompt
In the ancient land of Eldoria, where the skies were painted with shades of mystic hues and the forests whispered secrets of old, there existed a dragon named Zephyros. Unlike the fearsome tales of dragons that plagued human hearts with terror, Zephyros was a creature of wonder and wisdom, revered by all who knew of his existence.
Generated Output
Request
--request POST 'https://modelslab.com/api/v6/voice/text_to_speech' \
Make a POST
request to https://modelslab.com/api/v6/voice/text_to_speech endpoint and pass the required parameters as a request body.
Body Attributes
Parameter | Description | Values |
---|---|---|
key | The API key used for authenticating your request. | String |
prompt | The text prompt that describes the audio to be generated. | Text |
voice_id | The ID of trained voice only. You can get the list of trained voices here. | string trained voice ID |
language | The language for the voice. Defaults to English if not specified. | American English , British English , Spanish , Japanese , Mandarin Chinese , French , Brazilian Portuguese , Hindi , Italian |
speed | playback speed of the generated audio. Defaults to 1.0 . | Integral value |
emotion | Whether to enable emotion support. Currently only available in English. Defaults to false . | TRUE or FALSE |
temp | Specifies if temporary links should be used valid for 24 hours. This can help if access to certain storage sites is blocked. Defaults to false . | TRUE or FALSE |
webhook | A URL where the API will send a POST request once the audio generation is complete. | URL |
track_id | An ID returned in the API response, used to identify webhook requests | Integral value |
Emotion Support
Emotion support is currently only available in English (en
) language. When emotion is enabled, you can use special tags in your text prompt to add expressive elements to the generated speech.
Available Emotion-Supported Voices
Female Voices
- Tara
- Leah
- Jess
- Mia
- Zoe
Male Voices
- Leo
- Dan
- Zac
Supported Emotion Tags
The following emotion tags can be added to speech prompts to enhance expressiveness:
Tag | Description |
---|---|
<laugh> | Adds a laughing effect |
<chuckle> | A soft chuckle for a subtle humorous tone |
<sigh> | Expresses disappointment, relief, or tiredness |
<cough> | Simulates a short cough |
<sniffle> | Mimics a sniffle, indicating sadness or a cold |
<groan> | Adds a groaning effect for frustration or discomfort |
<yawn> | Simulates yawning to express boredom or tiredness |
<gasp> | Expresses shock or surprise |
Example
Body
{
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
}
Request
- JS
- PHP
- NODE
- PYTHON
- JAVA
var myHeaders = new Headers();
myHeaders.append("Content-Type", "application/json");
var raw = JSON.stringify({
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
});
var requestOptions = {
method: 'POST',
headers: myHeaders,
body: raw,
redirect: 'follow'
};
fetch("https://modelslab.com/api/v6/voice/text_to_speech", requestOptions)
.then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));
<?php
$payload = [
"key" => "",
"prompt" =>"Build next-generation AI products without worrying about GPUs",
"language" => "american english",
"voice_id" =>"madison",
"speed" => 1,
"emotion" =>false
];
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => 'https://modelslab.com/api/v6/voice/text_to_speech',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'POST',
CURLOPT_POSTFIELDS => json_encode($payload),
CURLOPT_HTTPHEADER => array(
'Content-Type: application/json'
),
));
$response = curl_exec($curl);
curl_close($curl);
echo $response;
var request = require('request');
var options = {
'method': 'POST',
'url': 'https://modelslab.com/api/v6/voice/text_to_speech',
'headers': {
'Content-Type': 'application/json'
},
body: JSON.stringify({
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
})
};
request(options, function (error, response) {
if (error) throw new Error(error);
console.log(response.body);
});
import requests
import json
url = "https://modelslab.com/api/v6/voice/text_to_speech"
payload = json.dumps({
"key": "",
"prompt":"Build next-generation AI products without worrying about GPUs",
"language":"american english",
"voice_id":"madison",
"speed":1,
"emotion":false
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\n \"key\":\"\",\n \"prompt\":\"Build next-generation AI products without worrying about GPUs\",\n \"language\":\"american english\",\n \"voice_id\":\"madison\",\n \"speed\":1,\n \"emotion\":false\n}");
Request request = new Request.Builder()
.url("https://modelslab.com/api/v6/voice/text_to_speech")
.method("POST", body)
.addHeader("Content-Type", "application/json")
.addHeader("X-API-Key", "{{token}}")
.build();
Response response = client.newCall(request).execute();
Response
- Success
- Processing
- Error
{
"status": "success",
"generationTime": 1.904285192489624,
"id": 334166,
"output": [
"https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"proxy_links": [
"https://cdn2.stablediffusionapi.com/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"meta": {
"base64": "no",
"emotion": "Neutral",
"filename": "b2dff60e-4636-4178-9a72-04a10a309185.wav",
"input_sound_clip": [
"tmp/0-b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"input_text": "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"language": "english",
"speed": 1,
"temp": "no"
}
}
{
"status": "processing",
"tip": "Your audio is processing in background, you can get this audio using fetch API",
"eta": 100,
"message": "Try to fetch request after seconds estimated",
"fetch_result": "https://modelslab.com/api/v6/voice/fetch/334166",
"id": 334166,
"output": [],
"future_links": [
"https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"proxy_links": [
"https://cdn2.stablediffusionapi.com/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"meta": {
"base64": "no",
"emotion": "Neutral",
"filename": "b2dff60e-4636-4178-9a72-04a10a309185.wav",
"input_sound_clip": [
"tmp/0-b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"input_text": "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"language": "english",
"speed": 1,
"temp": "no"
}
}
{
"status": "error",
"message": "Error message"
}