Skip to main content
Make sure you add your s3 details for voice_cloning server, so you can receive image generated in your bucket. Images generated without s3 details being added will be delete after 24 hours
Find all available voice models HERE.

Request

Make a POST request to below endpoint and pass the required parameters as a request body.
curl
--request POST 'https://modelslab.com/api/v1/enterprise/voice/voice_cover' \

Example

Body

json
{   
  "key": "enterprise_api_key",
  "init_audio": "https://music.youtube.com/watch?v=aZ1hziFhj1o",
  "model_id": "zoro",
  "pitch": "none",
  "rate": 0.5,
  "radius": 3,
  "mix": 0.25,
  "algorithm": "rmvpe",
  "hop_length": 128,
  "originality": 0.5,
  "lead_voice_volume_delta": "+1",
  "backup_voice_volume_delta": "-2",
  "instrument_volume_delta":"+2",
  "reverb_size": 0.15,
  "wetness": 0.2,
  "dryness": 0.8,
  "damping": 0.7,
  "base64": false,
  "temp": false,
  "webhook": null,
  "track_id" : null
}

Body Attributes

key
string
required
Your API Key used for request authorization.
init_audio
string
required
URL (YouTube links supported) or valid .wav file base64 data whose audio you want to clone with the model.
model_id
string
required
ID of the voice cover model Find Models Here.
pitch
string
default:"none"
Voice pitch conversion. Options:
  • m2f: Male-to-Female
  • f2m: Female-to-Male
  • none: No pitch conversion
algorithm
string
default:"rmvpe"
Pitch detection algorithm. Default: rmvpe.
rate
number
default:"0.5"
Rate of control for generated voice leakage. Higher values bias model towards training data. Default: 0.5.
seed
integer
Seed for reproducibility. Same seed gives the same output. Pass null for a random seed.
language
string
default:"english"
The language of the cloned voice. Default: english.
emotion
string
default:"neutral"
Emotional tone of the generated voice. Default: neutral.
speed
number
default:"1.0"
Floating point value for playback speed of the speaker. Default: 1.0.
radius
integer
default:"3"
Median filtering length to reduce breathiness and artifacts. Default: 3.
mix
number
default:"0.25"
Mix between original loudness and fixed loudness. Default: 0.25.
hop_length
integer
Hop length for pitch changes (only applies when using mangio-crepe).
originality
number
default:"0.33"
Controls similarity to original vocals (voiceless consonants). Default: 0.33.
lead_voice_volume_delta
number
Adjust lead vocals volume. Range: -5 (decrease) to +5 (increase).
backup_voice_volume_delta
number
Adjust backup vocals volume. Range: -5 (decrease) to +5 (increase).
instrument_volume_delta
number
Adjust instrumental volume. Range: -5 (decrease) to +5 (increase).
reverb_size
number
default:"0.15"
Reverb room size. Default: 0.15.
wetness
number
default:"0.2"
Reverb for generated vocals. Default: 0.2.
dryness
number
default:"0.8"
Reverb for original vocals. Default: 0.8.
damping
number
default:"0.7"
High-frequency damping factor in reverb. Default: 0.7.
base64
boolean
default:"false"
Whether the input sound clip is in base64 format. Default: false.
temp
boolean
default:"false"
Whether you want the output to be auto-deleted from the server after a short time. Default: false.
webhook
string
A URL to receive a POST API call once the voice cloning process is complete.
track_id
string
This ID is returned in the webhook callback to identify the request.
I