OpenAI-compatible chat completions endpoint. Works with any OpenAI SDK or compatible client.
Authorization header.
"stream": true to receive Server-Sent Events (SSE) as tokens are generated:
chat.completion.chunk object:
base_url and api_key:
Bearer token authentication using ModelsLab API key
Array of chat messages
Model ID to use for the completion (e.g. 'Qwen/Qwen2.5-VL-72B-Instruct-together')
Maximum number of tokens to generate
x >= 1Sampling temperature (0-2). Higher values make output more random.
0 <= x <= 2Nucleus sampling parameter
0 <= x <= 1Whether to stream partial results as Server-Sent Events
Penalize new tokens based on whether they appear in the text so far
-2 <= x <= 2Penalize new tokens based on their frequency in the text so far
-2 <= x <= 2