Skip to main content

Enterprise: Text to Audio Endpoint

Overview

Text to audio endpoint allows you to create an audio by passing in the text and a valid audio url or a pre created voice as voice_id. The result produces an audio with the same sound as the audio url or voice that was passed.

caution

Make sure you add your s3 details for voice_cloning server, so you can receive image generated in your bucket. Images generated without s3 details being added will be delete after 24 hours

Request

--request POST 'https://modelslab.com/api/v1/enterprise/voice/text_to_audio' \

Make a POST request to https://modelslab.com/api/v1/enterprise/voice/text_to_audio endpoint and pass the required parameters as a request body.

Body Attributes

ParameterDescription
keyYour API Key used for request authorization
promptText prompt with description of the audio you want to generate
init_audioA valid audio url you want it voice cloned
voice_idOptional, A valid id from the lists of voices
languageThe language of the voice. The supported languages includes english, arabic, chinese, spanish, german , czech , dutch, french, hindi, hungarian, italian, japanese, korean, polish, russian, turkish. Default is english
emotionOne of neutral, happy, sad, angry,dull. Defaults to neutral
base64Whether the input sound clip is in base64 or not. Should be true or false. Defaults to false.
tempWhether you want temporary links or not. This is useful if your country blocks access to our storage sites. Should be true or false. Defaults
webhookSet an URL to get a POST API call once the image generation is complete.
track_idThis ID is returned in the response to the webhook API call. This will be used to identify the webhook request.

Note: You can either pass init_audio or voice_id. However, if both are passed at the same time the init_audio takes preference.

Example

Body

Body
{   
"key": "",
"prompt":"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"init_audio":"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
"language":"english",
"webhook": null,
"track_id": null
}

Request

var myHeaders = new Headers();
myHeaders.append("Content-Type", "application/json");

var raw = JSON.stringify({
"key": "",
"prompt":"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"init_audio":"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav",
"language":"english",
"webhook": null,
"track_id": null
});

var requestOptions = {
method: 'POST',
headers: myHeaders,
body: raw,
redirect: 'follow'
};

fetch("https://modelslab.com/api/v1/enterprise/voice/text_to_audio", requestOptions)
.then(response => response.text())
.then(result => console.log(result))
.catch(error => console.log('error', error));

Response

{
"status": "success",
"generationTime": 1.4285192489624,
"id": 334166,
"output": [
"https://pub-3626123a908346a7a8be8d9295f44e26.r2.dev/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"proxy_links": [
"https://cdn2.stablediffusionapi.com/generations/b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"meta": {
"base64": "no",
"emotion": "Neutral",
"filename": "b2dff60e-4636-4178-9a72-04a10a309185.wav",
"input_sound_clip": [
"tmp/0-b2dff60e-4636-4178-9a72-04a10a309185.wav"
],
"input_text": "Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.",
"language": "english",
"speed": 1,
"temp": "no"
}
}