Skip to main content
POST
/
video
/
text2video
Generate video from text prompt
curl --request POST \
  --url https://modelslab.com/api/v6/video/text2video \
  --header 'Content-Type: application/json' \
  --data '
{
  "key": "<string>",
  "model_id": "cogvideox",
  "prompt": "<string>",
  "negative_prompt": "<string>",
  "seed": 123,
  "height": 512,
  "width": 512,
  "num_frames": 25,
  "num_inference_steps": 20,
  "guidance_scale": 7,
  "clip_skip": 1,
  "upscale_height": 640,
  "upscale_width": 1024,
  "upscale_strength": 0.6,
  "upscale_guidance_scale": 12,
  "upscale_num_inference_steps": 20,
  "use_improved_sampling": false,
  "improved_sampling_seed": 123,
  "fps": 15,
  "output_type": "gif",
  "instant_response": false,
  "temp": false,
  "webhook": "<string>",
  "track_id": "<string>"
}
'
{
  "status": "success",
  "generationTime": 123,
  "id": 123,
  "output": [
    "<string>"
  ],
  "proxy_links": [
    "<string>"
  ],
  "future_links": [
    "<string>"
  ],
  "meta": {},
  "eta": 123,
  "message": "<string>",
  "tip": "<string>",
  "fetch_result": "<string>"
}
Text to Video Example Generate videos from text descriptions using state-of-the-art video generation models like CogVideoX. Perfect for creating short-form content, animations, and visual storytelling.

Request

Make a POST request to the endpoint below with the required parameters.
POST https://modelslab.com/api/v6/video/text2video

Body

json
{
    "key": "your_api_key",
    "model_id": "cogvideox",
    "prompt": "A majestic space station orbiting Earth, with the sun rising behind it, cinematic, 4K",
    "negative_prompt": "low quality, blurry, static",
    "height": 512,
    "width": 512,
    "num_frames": 25,
    "num_inference_steps": 20,
    "guidance_scale": 7,
    "output_type": "mp4",
    "webhook": null,
    "track_id": null
}

Async Pattern

Since video generation takes time, use this pattern:
import requests
import time

def generate_video(prompt, api_key):
    # 1. Submit the request
    response = requests.post(
        "https://modelslab.com/api/v6/video/text2video",
        json={
            "key": api_key,
            "model_id": "cogvideox",
            "prompt": prompt,
            "num_frames": 25
        }
    )
    data = response.json()

    if data["status"] == "error":
        raise Exception(data["message"])

    request_id = data["id"]

    # 2. Poll for results
    while True:
        fetch = requests.post(
            f"https://modelslab.com/api/v6/video/fetch/{request_id}",
            json={"key": api_key}
        )
        result = fetch.json()

        if result["status"] == "success":
            return result["output"][0]
        elif result["status"] == "failed":
            raise Exception(result.get("message", "Generation failed"))

        # Still processing, wait and retry
        time.sleep(5)

# Usage
video_url = generate_video("A sunset over the ocean", "your_api_key")
print(f"Video ready: {video_url}")

Tips for Better Videos

Unlike images, videos need motion descriptions:
  • ❌ “A cat”
  • ✅ “A cat walking across a sunny room, tail swaying”
Video models work best with clear, focused prompts. Avoid overly complex scenes.
Generate at 512x512 then use upscale parameters for higher resolution output.
  • MP4: Best for most uses, smaller file size
  • GIF: Good for short loops, works everywhere

Body

application/json
key
string
required

Your API Key used for request authorization

model_id
enum<string>
required

The ID of the model to use

Available options:
cogvideox,
wanx
prompt
string
required

Text prompt describing the video content

negative_prompt
string

Items you don't want in the video

seed
integer | null

Seed for reproducible results. Same seed gives same result. Pass null for random

height
integer
default:512

Height of the video in pixels

Required range: x <= 512
width
integer
default:512

Width of the video in pixels

Required range: x <= 512
num_frames
integer
default:25

Number of frames in the video

Required range: x <= 25
num_inference_steps
integer
default:20

Number of denoising steps

Required range: x <= 50
guidance_scale
number
default:7

Scale for classifier-free guidance

Required range: 0 <= x <= 8
clip_skip
integer | null

Number of CLIP layers to skip. Skipping 2 layers often gives more aesthetic results

Required range: x <= 2
upscale_height
integer
default:640

The upscaled height for videos generated

Required range: x <= 1024
upscale_width
integer
default:1024

The upscaled width for videos generated

Required range: x <= 1024
upscale_strength
number
default:0.6

Strength of upscaling. Higher values result in more noticeable differences

Required range: 0 <= x <= 1
upscale_guidance_scale
number
default:12

Guidance scale for upscaling videos

Required range: 0 <= x <= 8
upscale_num_inference_steps
integer
default:20

Number of denoising steps for upscaling

Required range: x <= 50
use_improved_sampling
boolean
default:false

Whether to use improved sampling technique for better temporal consistency

improved_sampling_seed
integer

Seed for consistent video generation with improved sampling

fps
integer

Frames per second rate of the generated video

Required range: x <= 16
output_type
enum<string>
default:gif

Output format type

Available options:
mp4,
gif
instant_response
boolean
default:false

If true, returns future links for queued requests instantly instead of waiting

temp
boolean
default:false

If true, stores video in temporary storage (cleaned every 24 hours)

webhook
string<uri>

URL to receive a POST API call once video generation is complete

track_id
string

Unique ID used in webhook response to identify the request

Response

Video generation response

status
enum<string>

Status of the video generation

Available options:
success,
processing,
error
generationTime
number

Time taken to generate the video in seconds

id
integer

Unique identifier for the video generation

output
string<uri>[]

Array of generated video URLs

Array of proxy video URLs

Array of future video URLs for queued requests

meta
object

Metadata about the video generation including all parameters used

eta
integer

Estimated time for completion in seconds (processing status)

message
string

Status message or additional information

tip
string

Additional information or tips for the user

fetch_result
string<uri>

URL to fetch the result when processing