Song Generation Guide
This guide contains professional music creation knowledge to help you create high-quality songs using the ModelsLab Song Generator API, powered by ACE-Step v1.5 model.Overview
The Song Generator API allows you to create complete songs with vocals in 50+ languages using the advanced ACE-Step v1.5 model. You can either provide your own lyrics or let the AI generate them automatically based on your prompt.Key Features
- Duration Control: Generate songs from 30 seconds to 8 minutes (30-480 seconds)
- 50+ Languages: Support for languages from Arabic to Chinese
- Lyrics Generation: Automatic lyrics generation or use your own
- Instrumental Mode: Generate instrumental versions without vocals
- Style Control: Use caption to define music style, instruments, and atmosphere
Understanding the Parameters
Caption: Your Music Blueprint
Caption is the most important parameter affecting your generated song. It describes the overall music elements you want.What to Include in Caption
| Dimension | Examples |
|---|---|
| Style/Genre | pop, rock, jazz, electronic, hip-hop, R&B, folk, reggaeton, synthwave |
| Emotion/Atmosphere | melancholic, uplifting, energetic, dreamy, dark, nostalgic, euphoric, intimate |
| Instruments | acoustic guitar, piano, synth pads, 808 drums, strings, brass, electric bass |
| Timbre Texture | warm, bright, crisp, airy, punchy, lush, raw, polished |
| Era Reference | 80s synth-pop, 90s grunge, 2010s EDM, vintage soul, modern trap |
| Vocal Characteristics | female vocal, male vocal, breathy, powerful, falsetto, raspy |
| Production Style | lo-fi, high-fidelity, live recording, studio-polished |
Caption Writing Principles
- Be Specific — “sad piano ballad with female breathy vocal” works better than “a sad song”
- Combine Dimensions — Mix style + emotion + instruments + timbre for precise control
- Use References — ”80s synthwave style” or “reggaeton with flamenco influence” conveys complex aesthetics quickly
- Texture Words Matter — Adjectives like warm, crisp, airy, punchy influence mixing and timbre
- Balance Detail vs Freedom — More details = more control, fewer details = more AI creativity
- Avoid Conflicts — Don’t combine incompatible styles like “classical strings” and “hardcore metal” unless you want evolution
Lyrics: Your Song’s Timeline
Lyrics control how your song unfolds over time. They include:- Lyric text content
- Structure tags ([Verse], [Chorus], etc.)
- Vocal style hints
- Instrumental sections
- Energy changes
Common Structure Tags
| Category | Tag | Description |
|---|---|---|
| Basic Structure | [Intro] | Opening, establish atmosphere |
[Verse] / [Verse 1] | Verse, narrative progression | |
[Pre-Chorus] | Build energy before chorus | |
[Chorus] | Emotional climax, hook | |
[Bridge] | Transition or elevation | |
[Outro] | Ending, conclusion | |
| Dynamic Sections | [Build] | Energy gradually rising |
[Drop] | Electronic music energy release | |
[Breakdown] | Reduced instrumentation | |
| Instrumental | [Instrumental] | Pure instrumental, no vocals |
[Guitar Solo] | Guitar solo section | |
[Piano Interlude] | Piano interlude | |
| Special Tags | [Fade Out] | Fade out ending |
Vocal Control Tags
| Tag | Effect |
|---|---|
[raspy vocal] | Raspy, textured vocals |
[whispered] | Whispered vocals |
[falsetto] | Falsetto vocals |
[powerful belting] | Powerful, high-pitched singing |
[harmonies] | Layered harmonies |
Energy Tags
| Tag | Effect |
|---|---|
[high energy] | High energy, passionate |
[building energy] | Increasing energy |
[explosive] | Explosive energy |
[melancholic] | Melancholic mood |
[euphoric] | Euphoric feeling |
Lyrics Writing Tips
1. Control Syllable Count Keep 6-10 syllables per line for best results. Consistent syllable counts create better rhythm. 2. Use Case for IntensityKeep Caption and Lyrics Consistent
⚠️ Critical: Descriptions in Caption and Lyrics must align. If Caption says “soft piano ballad” but Lyrics has[explosive metal solo], results will be poor.
Checklist:
- Instruments in Caption ↔ Instrumental tags in Lyrics
- Emotion in Caption ↔ Energy tags in Lyrics
- Vocal description in Caption ↔ Vocal control tags in Lyrics
Duration Calculation
You MUST calculate appropriate duration based on your lyrics and structure.Estimation Method
- Per line of lyrics: 3-5 seconds
- Intro/Outro: 5-10 seconds each
- Instrumental sections: 5-15 seconds
- Typical structures:
- 2 verses + 2 choruses: 120-150 seconds minimum
- 2 verses + 2 choruses + bridge: 180-240 seconds
- Full song with intro/outro: 210-270 seconds (3.5-4.5 minutes)
Common Pitfall
❌ DON’T: 10 lines of lyrics with 60 seconds duration → rushed and compressed ✅ DO: 10 lines → ~40 seconds vocals + 20 seconds intro/outro = 60+ seconds Rule: When in doubt, estimate longer rather than shorter.Using Lyrics Generation
When you don’t have lyrics, setlyrics_generation: true and provide:
- prompt: Describe the topic/theme for lyrics
- caption: Describe the music style (same as with manual lyrics)
Example Request with Lyrics Generation
Instrumental Mode
To generate music without vocals, setinstrumental: true:
Language Support
The API supports 50+ languages. Specify the language code:| Language | Code | Language | Code |
|---|---|---|---|
| English | en | Spanish | es |
| Chinese | zh | French | fr |
| Japanese | ja | German | de |
| Korean | ko | Italian | it |
| Portuguese | pt | Russian | ru |
| Hindi | hi | Arabic | ar |
| Cantonese | yue | Turkish | tr |
Complete Example
Reggaeton Track with Manual Lyrics
Analysis
Caption matches Lyrics:- ✅ Caption says “reggaeton with flamenco” → Lyrics in Spanish with reggaeton structure
- ✅ Caption says “confident female vocal” → Lyrics tone matches
- ✅ Caption mentions “whispered outro” → Lyrics has
[Bridge - whispered] - ✅ Duration 199 seconds appropriate for lyrics amount
Best Practices Summary
- Caption First — Spend time crafting detailed, specific caption
- Consistent Description — Ensure caption and lyrics tell the same story
- Calculate Duration — Count lyrics lines and sections, then estimate time
- Use Structure Tags — Clear sections improve song structure
- Test Iterations — Start simple, then refine based on results
- Language Matters — Set correct language code for best pronunciation
Common Mistakes to Avoid
| Mistake | Fix |
|---|---|
| Too short duration for lyrics | Calculate: lines × 4 seconds + intro/outro |
| Conflicting caption and lyrics | Align instruments, energy, vocal style |
| Vague caption | Add specific genres, instruments, emotions |
| Too many structure tags | Keep tags simple, details in caption |
| No section separation | Add blank lines between sections |
| Mixed incompatible styles | Either separate or describe as evolution |
Getting Started
- Start with Song Generator API Reference
- Try simple examples first
- Iterate on caption and lyrics
- Use webhooks for async processing
- Join our Discord for community support
Need Help? Check our API Reference or reach out via Support.

