
Voice Cloning API: Expanded Language Support & Performance Improvements
The Voice Cloning endpoint has been upgraded with broader language coverage and faster inference times for more natural-sounding output.API Endpoint
- Standard API:
POST /api/v6/voice/text_to_audio
What’s New
- 48 languages supported — up from 19, now covering major South Asian, Southeast Asian, Middle Eastern, and European languages
- Faster inference — reduced latency for quicker audio generation
- More natural output — improved prosody and pronunciation across all supported languages
- Fully backward compatible — all existing language values continue to work
Newly Added Languages
Assamese, Bengali, Finnish, Gujarati, Hebrew, Indonesian, Kannada, Maithili, Malay, Malayalam, Marathi, Min Nan Chinese, Nepali, Odia, Punjabi, Sindhi, Sinhala, Slovak, Swahili, Tamil, Telugu, Ukrainian, Urdu, Vietnamese, Welsh, Yue ChineseMusicGen API: New Parameters
Addedduration, output_format, and bitrate parameters to the MusicGen API.API Endpoint
- Standard API:
POST /api/v6/voice/music_gen
New Parameters
duration: Set the length of generated music in seconds. Any value between 30 and 480. Default:30.output_format: Choose output format —wav,mp3, orflac. Default:wav.bitrate: Set audio bitrate —128k,192k, or320k. Default:320k.
New Enterprise API Endpoint: Speech-to-Text
Enterprise Speech-to-Text is now available for converting audio into text transcription.API Endpoint
- Enterprise API:
POST /api/v1/enterprise/speech_to_text/transcribe
Key Features
- Convert speech audio into text transcription
- Multi-language transcription support
- Optional timestamp controls with
timestamp_level(null,word,sentence) - Webhook support via
webhookandtrack_idfor async tracking

Song Generator API powered by ACE-Step v1.5
Major upgrade to the Song Generator API with the new ACE-Step v1.5 model for professional-grade song creation.API Endpoint
- Standard API:
POST /api/v6/voice/song_generator
What’s New
- ACE-Step v1.5 Model: State-of-the-art AI model for high-quality song generation with vocal synthesis
- 50+ Languages: Generate songs with vocals in languages from Arabic to Chinese, Cantonese to Spanish
- Flexible Duration: Create songs from 30 seconds to 8 minutes (30-480 seconds)
- Instrumental Mode: Generate instrumental versions without vocals using
instrumentalparameter - Smart Lyrics Generation: Automatic lyrics generation based on prompt and caption, or use your own lyrics
- Advanced Control:
captionparameter for music style, instruments, atmosphere, and production stylelyricsparameter for song structure, vocal styles, and energy controlpromptparameter for automatic lyrics generation- Language-specific vocal synthesis with proper pronunciation
New Documentation
- Song Generator API Reference: Complete API documentation with examples
- Song Generation Guide: Professional guide with best practices, caption writing tips, lyrics structure guidance, duration calculation, and real-world examples
Key Features
- Professional music structure tags:
[Intro],[Verse],[Chorus],[Bridge],[Outro] - Vocal control tags:
[raspy vocal],[whispered],[falsetto],[powerful belting] - Energy control:
[high energy],[building energy],[explosive],[melancholic] - Consistent caption-lyrics matching for optimal results
- Duration calculation guidelines based on lyrics length and structure
Getting Started
Check out the Song Generation Guide for detailed examples and best practices for creating professional songs with the ACE-Step v1.5 model.New Parameter in Song Generator API Endpoint: Song Generator API
Addedmodel_id parameter to select between diffrhythm-short and diffrhythm-long models for song generation.diffrhythm-short: Generates shorter with maximum duration of 1 minute 35 seconds.diffrhythm-long: Generates longer songs with maximum duration of 4 minute 45 seconds.
New parameter in Lyrics Generator API Endpoint: Lyrics Generator API
Addedlength parameter to specify desired length of generated lyrics.short: Generates shorter lyrics with maximum duration of 1 minute 35 seconds.long: Generates longer lyrics with maximum duration of 4 minute 45 seconds.
API Endpoints
- Standard API:
POST /api/v6/voice/song_generator - Standard API:
POST /api/v6/voice/lyrics_generator
Key Features
- Select between short and long models for song generation
- Specify desired length of generated lyrics
New Enterprise API Endpoint: Qwen Text to Image
Generate high-definition images from text using the Qwen model.API Endpoint
- Enterprise API:
POST /api/v1/enterprise/qwen/text2img
Key Features
- Generate high-definition images from text using Qwen model
- Supports various image styles and attributes
- Resolution up to 1024x1024 pixels.
New Video API Endpoint: Watermark Remover
Remove watermarks from SORA videos.API Endpoint
- Standard API:
POST /api/v6/video/watermark_remover
Key Features
- SORA watermark detection and removal
- Preserves video quality
New Image Editing Endpoint: Caption
Simple and powerful image captioning endpoint to generate descriptive text from images.API Endpoint
- Standard API:
POST /api/v6/image_editing/caption
Key Features
- Automatic image caption generation
- Customizable caption length (short, normal, long)
- Supports multiple image formats:
png,jpeg,jpg
Flux Kontext Dev Moved to Image Editing API
Flux Kontext Image to Image endpoint moved from Image Generation API to Image Editing API section for better organization.- New Location: Image Editing API → Flux Kontext Image to Image
- Endpoint:
POST /api/v6/images/img2img - Fixed OpenAPI playground display
New Image Editing Endpoint: Qwen Edit
Added Qwen Edit endpoint for AI-powered image editing using the Qwen model.API Endpoints
- Standard API:
POST /api/v6/image_editing/qwen_edit - Enterprise API:
POST /api/v1/enterprise/image_editing/qwen_edit
Key Features
- Prompt-based image editing and manipulation
- Support for single or multiple images (up to 4 images)
New Interior API Endpoints
Added two new endpoints to the Interior API for enhanced object manipulation capabilities:Object Removal
- Endpoint:
POST /api/v6/interior/object_removal - Remove unwanted objects from interior images using AI
- Parameters:
init_image,object_name,base64,webhook,track_id - Simple text-based object identification
Interior Mixer
- Endpoint:
POST /api/v6/interior/interior_mixer - Add objects from one image into another room image
- Parameters:
init_image,object_image,prompt,width,height,guidance_scale,num_inference_steps - Intelligent object placement with prompt-based positioning
- Configurable inference steps (default: 8) and guidance scale
Documentation Updates
- Added complete API reference documentation for both endpoints
- Updated OpenAPI specification with new schemas
- Added visual indicators for new endpoints in the overview
Rate Limits Documentation
Added comprehensive rate limits documentation with plan-specific queue limits:- Pay as you go plan: 5 queued API requests
- Standard plan: 10 queued API requests
- Unlimited Premium Plan: 15 queued API requests
Key Features
- Sequential Processing: Requests are processed one after another in queue order
- Queue Management: New requests are added to the queue and processed when previous ones complete
- Real-time Enforcement: Limits are enforced in real-time as requests come in
- FIFO Processing: Requests are processed in First-In-First-Out order
Enterprise API Updates
- Added Reset S3 endpoint to Enterprise API General section
- Updated S3 management capabilities for dedicated servers
New Model: Wan 2.5
Added Wan 2.5 to ModelsLab with enhanced video generation capabilities:- Text to Video: Generate videos from text prompts with audio support
- Image to Video: Transform static images into dynamic videos with sound
- Audio Integration: Built-in audio support for complete multimedia experiences
- Enhanced Quality: Improved motion smoothness and visual realism

