Skip to main content
POST
/
voice
/
text_to_audio
Generate audio from text with voice cloning
curl --request POST \
  --url https://modelslab.com/api/v6/voice/text_to_audio \
  --header 'Content-Type: application/json' \
  --data '
{
  "key": "<string>",
  "prompt": "<string>",
  "init_audio": "<string>",
  "voice_id": "<string>",
  "language": "english",
  "speed": 1,
  "base64": false,
  "temp": false,
  "stream": true,
  "webhook": "<string>",
  "track_id": 123
}
'
{
  "status": "success",
  "generationTime": 123,
  "id": 123,
  "output": [
    "<string>"
  ],
  "proxy_links": [
    "<string>"
  ],
  "future_links": [
    "<string>"
  ],
  "links": [
    "<string>"
  ],
  "meta": {},
  "eta": 123,
  "message": "<string>",
  "tip": "<string>",
  "fetch_result": "<string>",
  "audio_time": 123
}

Request

Make a POST request to below endpoint and pass the required parameters as a request body.
Note: You can either pass init_audio or voice_id. However, if both are passed at the same time the init_audio takes preference.
curl
--request POST 'https://modelslab.com/api/v6/voice/text_to_audio' \

Language-Specific Guidelines

When using the Text-to-Audio API for Hindi language generation, follow these best practices for accurate and natural-sounding output.
Best Practices for Hindi Language
  • Use clear and well-structured Hindi sentences.
  • Ensure proper punctuation for better clarity.
  • Write numbers in Hindi text format instead of digits.
  • Correct: “दो हज़ार पच्चीस”
  • Incorrect: 2025
  • Correct: “पंद्रह अगस्त उन्नीस सौ सैंतालीस”
  • Incorrect: 15/08/1947
  • Always use expanded Hindi form for dates and numbers.
  • Convert English abbreviations to Hindi phonetic spelling.
  • Correct: आईआईटी (for IIT)
  • Correct: यूएसए (for USA)
  • Correct: एमएल (for ML)
  • Spell out abbreviations in Hindi script to improve pronunciation.

Body

json
{    
  "key": "your_api_key", 
  "prompt":"Narrative voices capable of pronouncing terminologies & acronyms in training and ai learning materials.", 
  "init_audio":"https://pub-f3505056e06f40d6990886c8e14102b2.r2.dev/audio/tom_hanks_1.wav", 
  "language":"english", 
  "webhook": null, 
  "track_id": null
}

Body

application/json
key
string
required

API key for authentication

prompt
string
required

Text prompt describing audio to be generated

init_audio
string<uri>

Valid URL pointing to audio file for voice cloning (4-30 seconds)

voice_id
string

ID of voice from available list Find Voice IDs Here

language
enum<string>
default:english

Language for the voice

Available options:
english,
arabic,
spanish,
brazilian portuguese,
german,
czech,
chinese,
dutch,
french,
hindi,
hungarian,
italian,
japanese,
korean,
polish,
russian,
turkish
speed
number
default:1

Playback speed of generated audio

base64
boolean
default:false

Whether input audio is in base64 format

temp
boolean
default:false

Use temporary links valid for 24 hours

stream
boolean

Stream response in base64 format

webhook
string<uri>

URL to receive POST notification upon completion

track_id
integer

ID for webhook identification

Response

Text to audio response

status
enum<string>

Status of the voice generation

Available options:
success,
processing,
error
generationTime
number

Time taken to generate the audio in seconds

id
integer

Unique identifier for the voice generation

output
string<uri>[]

Array of generated audio URLs

Array of proxy audio URLs

Array of future audio URLs for queued requests

Array of audio URLs (voice cover response)

meta
object

Metadata about the audio generation including all parameters used

eta
integer

Estimated time for completion in seconds (processing status)

message
string

Status message or additional information

tip
string

Additional information or tips for the user

fetch_result
string<uri>

URL to fetch the result when processing

audio_time
number

Duration of the generated audio in seconds