POST
/
voice
/
voice_cover
Transform voice in audio using AI models
curl --request POST \
  --url https://modelslab.com/api/v6/voice/voice_cover \
  --header 'Content-Type: application/json' \
  --data '{
  "key": "<string>",
  "init_audio": "<string>",
  "model_id": "<string>",
  "pitch": "m2f",
  "algorithm": "rmvpe",
  "rate": 0.5,
  "seed": 123,
  "language": "english",
  "emotion": "neutral",
  "speed": 1,
  "radius": 3,
  "mix": 0.25,
  "hop_length": 123,
  "originality": 0.33,
  "lead_voice_volume_delta": "<string>",
  "backup_voice_volume_delta": "<string>",
  "instrument_volume_delta": "<string>",
  "reverb_size": 0.15,
  "wetness": 0.2,
  "dryness": 0.8,
  "damping": 0.7,
  "base64": false,
  "temp": false,
  "webhook": "<string>",
  "track_id": 123
}'
{
  "status": "success",
  "generationTime": 123,
  "id": 123,
  "output": [
    "<string>"
  ],
  "proxy_links": [
    "<string>"
  ],
  "future_links": [
    "<string>"
  ],
  "links": [
    "<string>"
  ],
  "meta": {},
  "eta": 123,
  "message": "<string>",
  "tip": "<string>",
  "fetch_result": "<string>",
  "audio_time": 123
}

Request

Make a POST request to below endpoint and pass the required parameters as a request body.
curl
--request POST 'https://modelslab.com/api/v6/voice/voice_cover' \

Body

json
{
    "key": "your_api_key",
    "init_audio": "https://www.youtube.com/watch?v=ixkoVwKQaJg",
    "model_id": "zoro",
    "pitch": "none",
    "rate": 0.5,
    "radius": 3,
    "mix": 0.25,
    "algorithm": "rmvpe",
    "hop_length": 128,
    "originality": 0.5,
    "lead_voice_volume_delta": "+1",
    "backup_voice_volume_delta": "-2",
    "instrument_volume_delta":"+2",
    "reverb_size": 0.15,
    "wetness": 0.2,
    "dryness": 0.8,
    "damping": 0.7,
    "base64": false,
    "temp": false,
    "webhook": null,
    "track_id" : null
}

Body

application/json
key
string
required

API key to authorize the request

init_audio
string
required

URL (YouTube supported) or base64 WAV file for audio to be cloned

model_id
string
required

ID of the voice cloning model Find Models Here

pitch
enum<string>

Controls pitch transformation between voices

Available options:
m2f,
f2m,
none
algorithm
enum<string>
default:rmvpe

Algorithm used for voice cloning

Available options:
rmvpe,
mangio-crepe
rate
number

Controls generated voice resemblance to training data

Required range: 0 <= x <= 1
seed
integer

Seed value to reproduce results (null for random)

language
string
default:english

Language for the voice

emotion
enum<string>
default:neutral

Emotion of the voice

Available options:
neutral,
happy,
sad,
angry,
dull
speed
number
default:1

Playback speed of the speaker

Required range: 0.5 <= x <= 2
radius
number
default:3

Median filtering length to reduce voice artifacts

Required range: 0 <= x <= 3
mix
number
default:0.25

Controls loudness similarity to original audio

Required range: 0 <= x <= 1
hop_length
integer

Frequency of pitch analysis (mangio-crepe algorithm)

originality
number
default:0.33

Controls similarity to original vocals' voiceless consonants

Required range: 0 <= x <= 1
lead_voice_volume_delta
string

Adjusts volume of lead vocals (-5 to +5)

backup_voice_volume_delta
string

Adjusts volume of backup vocals (-5 to +5)

instrument_volume_delta
string

Adjusts volume of instrumental tracks (-5 to +5)

reverb_size
number
default:0.15

Size of the reverb room

Required range: 0 <= x <= 1
wetness
number
default:0.2

Reverb applied to generated vocals

Required range: 0 <= x <= 1
dryness
number
default:0.8

Reverb applied to original vocals

Required range: 0 <= x <= 1
damping
number
default:0.7

Damping factor for high frequencies in reverb

Required range: 0 <= x <= 1
base64
boolean
default:false

Whether input sound clip is in base64 format

temp
boolean
default:false

Use temporary links valid for 24 hours

webhook
string<uri>

URL to receive POST notification upon completion

track_id
integer

ID for webhook identification

Response

Voice cover response

status
enum<string>

Status of the voice generation

Available options:
success,
processing,
error
generationTime
number

Time taken to generate the audio in seconds

id
integer

Unique identifier for the voice generation

output
string<uri>[]

Array of generated audio URLs

Array of proxy audio URLs

Array of future audio URLs for queued requests

Array of audio URLs (voice cover response)

meta
object

Metadata about the audio generation including all parameters used

eta
integer

Estimated time for completion in seconds (processing status)

message
string

Status message or additional information

tip
string

Additional information or tips for the user

fetch_result
string<uri>

URL to fetch the result when processing

audio_time
number

Duration of the generated audio in seconds