POST
/
voice
/
text_to_speech
Convert text to speech
curl --request POST \
  --url https://modelslab.com/api/v6/voice/text_to_speech \
  --header 'Content-Type: application/json' \
  --data '{
  "key": "<string>",
  "prompt": "<string>",
  "voice_id": "<string>",
  "language": "english",
  "speed": 1,
  "emotion": false,
  "temp": false,
  "webhook": "<string>",
  "track_id": 123
}'
{
  "status": "success",
  "generationTime": 123,
  "id": 123,
  "output": [
    "<string>"
  ],
  "proxy_links": [
    "<string>"
  ],
  "future_links": [
    "<string>"
  ],
  "links": [
    "<string>"
  ],
  "meta": {},
  "eta": 123,
  "message": "<string>",
  "tip": "<string>",
  "fetch_result": "<string>",
  "audio_time": 123
}

Request

Make a POST request to below endpoint and pass the required parameters as a request body.
curl
--request POST 'https://modelslab.com/api/v6/voice/text_to_speech' \

Emotion Support

Emotion support is currently only available in English (en) language. When emotion is enabled, you can use special tags in your text prompt to add expressive elements to the generated speech.

Available Emotion-Supported Voices

Female Voices

  • Tara
  • Leah
  • Jess
  • Mia
  • Zoe

Male Voices

  • Leo
  • Dan
  • Zac

Supported Emotion Tags

The following emotion tags can be added to speech prompts to enhance expressiveness:
TagDescription
<laugh>Adds a laughing effect
<chuckle>A soft chuckle for a subtle humorous tone
<sigh>Expresses disappointment, relief, or tiredness
<cough>Simulates a short cough
<sniffle>Mimics a sniffle, indicating sadness or a cold
<groan>Adds a groaning effect for frustration or discomfort
<yawn>Simulates yawning to express boredom or tiredness
<gasp>Expresses shock or surprise

Body

json
{    
    "key": "your_api_key",
    "prompt":"Build next-generation AI products without worrying about GPUs",
    "language":"american english",
    "voice_id":"madison",
    "speed":1,
    "emotion":false
}

Body

application/json
key
string
required

API key for authentication

prompt
string
required

Text prompt describing audio to be generate Max len 2500 chars.

voice_id
string
required

ID of trained voice Find Pretrained Voices Here

language
enum<string>
default:english

Language for the voice

Available options:
american english,
british english,
spanish,
japanese,
mandarin chinese,
french,
brazilian portuguese,
hindi,
italian
speed
number
default:1

Playback speed of generated audio

emotion
boolean
default:false

Enable emotion support (English only)

temp
boolean
default:false

Use temporary links valid for 24 hours

webhook
string<uri>

URL to receive POST notification upon completion

track_id
integer

ID for webhook identification

Response

Text to speech response

status
enum<string>

Status of the voice generation

Available options:
success,
processing,
error
generationTime
number

Time taken to generate the audio in seconds

id
integer

Unique identifier for the voice generation

output
string<uri>[]

Array of generated audio URLs

Array of proxy audio URLs

Array of future audio URLs for queued requests

Array of audio URLs (voice cover response)

meta
object

Metadata about the audio generation including all parameters used

eta
integer

Estimated time for completion in seconds (processing status)

message
string

Status message or additional information

tip
string

Additional information or tips for the user

fetch_result
string<uri>

URL to fetch the result when processing

audio_time
number

Duration of the generated audio in seconds