Skip to main content
POST
/
voice
/
song_generator
Generate song from lyrics and reference audio
curl --request POST \
  --url https://modelslab.com/api/v6/voice/song_generator \
  --header 'Content-Type: application/json' \
  --data '
{
  "key": "<string>",
  "init_audio": "<string>",
  "lyrics_generation": true,
  "lyrics": "<string>",
  "prompt": "<string>",
  "duration": 123,
  "language": "ar",
  "instrumental": true,
  "caption": "<string>",
  "webhook": "<string>",
  "track_id": 123
}
'
{
  "status": "success",
  "generationTime": 123,
  "id": 123,
  "output": [
    "<string>"
  ],
  "proxy_links": [
    "<string>"
  ],
  "future_links": [
    "<string>"
  ],
  "links": [
    "<string>"
  ],
  "meta": {},
  "eta": 123,
  "message": "<string>",
  "tip": "<string>",
  "fetch_result": "<string>",
  "audio_time": 123
}

Overview

The Song Generator API lets you create complete, production-quality songs by combining your lyrics with a reference audio file that defines the style, mood, and instrumentation. You can also skip writing lyrics entirely and let the model generate them automatically from a prompt.
1

Write your lyrics and style

Provide your lyrics directly or enable auto-generation with a prompt. Supply a caption describing the musical style : genre, instruments, vocal type, and mood.
2

Configure audio and duration

Optionally attach a reference audio via init_audio to influence the style further. Set the duration between 30 and 480 seconds.
3

Receive your song

The API returns a hosted audio URL once generation is complete, or fires a webhook to your endpoint if provided.

Request

Make a POST request to the endpoint below and pass the required parameters as a JSON body.
curl
--request POST 'https://modelslab.com/api/v6/voice/song_generator' \

Body Parameters

key
string
required
Your API key.
lyrics_generation
boolean
default:"false"
Set to true to auto-generate lyrics from prompt. When false, provide lyrics directly.
lyrics
string
Song lyrics with section tags like [Verse 1], [Chorus], [Bridge]. Required if lyrics_generation is false.
prompt
string
Topic or description for auto lyrics generation. Required if lyrics_generation is true.
caption
string
Musical style descriptor : genre, instruments, vocal type, tempo, mood. E.g. "female vocal, pop, piano, slow, emotional".
init_audio
string
URL to a reference audio file (MP3/WAV) to influence the song’s style.
duration
integer
Song length in seconds. Range: 30–480 (0.5–8 minutes).
language
string
Language code for lyrics and vocals. Defaults to auto-detection. See table below.
instrumental
boolean
default:"false"
Set to true to generate a vocals-free instrumental track.
webhook
string
URL to receive a POST callback when generation completes.
track_id
integer
Custom ID sent with the webhook payload for request correlation.

Example Requests

  • Song duration must be between 30 and 480 seconds (0.5–8 minutes)
  • If you don’t have lyrics, set lyrics_generation: true and provide a prompt and caption instead
  • Set instrumental: true to generate a vocals-free track
  • 50+ languages are supported : see the table below
With manual lyrics (lyrics_generation: false)
json
{
    "key": "your_api_key",
    "lyrics_generation": false,
    "lyrics": "[Verse 1]\nYour eyes hypnotize me, make me sigh\nYour lips call to me, I can't escape\n\n[Chorus]\nTonight I'll give you everything\nIn your arms I'll stay",
    "caption": "female vocal, reggaeton, deep bass, drum machine, reverb",
    "duration": 120,
    "webhook": null,
    "track_id": null
}
With auto lyrics generation (lyrics_generation: true)
json
{
    "key": "your_api_key",
    "lyrics_generation": true,
    "prompt": "A Cantopop track with layered female vocals and synth keys",
    "caption": "female vocal, Cantopop, synth keys, mid-tempo, ethereal",
    "duration": 180,
    "webhook": null,
    "track_id": null
}

Supported Languages

LanguageCode
Arabicar
Azerbaijaniaz
Bulgarianbg
Bengalibn
Catalanca
Czechcs
Danishda
Germande
Greekel
Englishen
Spanishes
Persianfa
Finnishfi
Frenchfr
Hebrewhe
Hindihi
Croatianhr
Haitian Creoleht
Hungarianhu
Indonesianid
Icelandicis
Italianit
Japaneseja
Koreanko
Latinla
Lithuanianlt
Malayms
Nepaline
Dutchnl
Norwegianno
Punjabipa
Polishpl
Portuguesept
Romanianro
Russianru
Sanskritsa
Slovaksk
Serbiansr
Swedishsv
Swahilisw
Tamilta
Telugute
Thaith
Tagalogtl
Turkishtr
Ukrainianuk
Urduur
Vietnamesevi
Cantoneseyue
Chinesezh

Body

application/json
key
string
required

API key for authentication

init_audio
string<uri>
required

URL to reference audio file to influence style

lyrics_generation
boolean

Pass true to generate lyrics automatically

lyrics
string

Lyrics in LRC format (timestamp + lyrics). Required if lyrics_generation is false

prompt
string

Topic for lyrics generation. Required if lyrics_generation is true

duration
integer | null

Duration of the generated song in seconds (30-480 seconds / 0.5-8 minutes)

language
enum<string> | null

Language for the generated song

Available options:
ar,
az,
bg,
bn,
ca,
cs,
da,
de,
el,
en,
es,
fa,
fi,
fr,
he,
hi,
hr,
ht,
hu,
id,
is,
it,
ja,
ko,
la,
lt,
ms,
ne,
nl,
no,
pa,
pl,
pt,
ro,
ru,
sa,
sk,
sr,
sv,
sw,
ta,
te,
th,
tl,
tr,
uk,
ur,
vi,
yue,
zh,
null
instrumental
boolean | null

Whether to generate an instrumental version without vocals

caption
string | null

Caption for the song describe styles, female or male voice or loops and more.

webhook
string<uri>

URL to receive POST notification upon completion

track_id
integer

ID for webhook identification

Response

Song generation response

status
enum<string>

Status of the voice generation

Available options:
success,
processing,
error
generationTime
number

Time taken to generate the audio in seconds

id
integer

Unique identifier for the voice generation

output
string<uri>[]

Array of generated audio URLs

Array of proxy audio URLs

Array of future audio URLs for queued requests

Array of audio URLs (voice cover response)

meta
object

Metadata about the audio generation including all parameters used

eta
integer

Estimated time for completion in seconds (processing status)

message
string

Status message or additional information

tip
string

Additional information or tips for the user

fetch_result
string<uri>

URL to fetch the result when processing

audio_time
number

Duration of the generated audio in seconds