⚡ NEW: Flux Klein 9B — Faster inference, stunning quality · Try Now
curl --request POST \
--url https://modelslab.com/api/v7/llm/chat/completions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"messages": [
{
"content": "<string>"
}
],
"model": "<string>",
"max_tokens": 1000,
"temperature": 1,
"top_p": 1,
"stream": false,
"presence_penalty": 0,
"frequency_penalty": 0
}
'{
"id": "<string>",
"created": 123,
"model": "<string>",
"choices": [
{
"index": 123,
"message": {
"content": "<string>"
}
}
],
"usage": {
"prompt_tokens": 123,
"completion_tokens": 123,
"total_tokens": 123
}
}OpenAI-compatible chat completions endpoint. Works with any OpenAI SDK or compatible client.
curl --request POST \
--url https://modelslab.com/api/v7/llm/chat/completions \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"messages": [
{
"content": "<string>"
}
],
"model": "<string>",
"max_tokens": 1000,
"temperature": 1,
"top_p": 1,
"stream": false,
"presence_penalty": 0,
"frequency_penalty": 0
}
'{
"id": "<string>",
"created": 123,
"model": "<string>",
"choices": [
{
"index": 123,
"message": {
"content": "<string>"
}
}
],
"usage": {
"prompt_tokens": 123,
"completion_tokens": 123,
"total_tokens": 123
}
}Documentation Index
Fetch the complete documentation index at: https://docs.modelslab.com/llms.txt
Use this file to discover all available pages before exploring further.
POST https://modelslab.com/api/v7/llm/chat/completions
Authorization header.
curl -X POST https://modelslab.com/api/v7/llm/chat/completions \
-H "Authorization: Bearer $MODELSLAB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 1000,
"temperature": 0.7
}'
{
"model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 1000,
"temperature": 0.7,
"top_p": 1,
"stream": false,
"presence_penalty": 0,
"frequency_penalty": 0
}
{
"id": "chat-abc123",
"object": "chat.completion",
"created": 1712345678,
"model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}
"stream": true to receive Server-Sent Events (SSE) as tokens are generated:
curl -X POST https://modelslab.com/api/v7/llm/chat/completions \
-H "Authorization: Bearer $MODELSLAB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
"messages": [{"role": "user", "content": "Write a haiku"}],
"stream": true
}'
chat.completion.chunk object:
data: {"id":"chat-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Silent"},"finish_reason":null}]}
data: {"id":"chat-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" snow"},"finish_reason":null}]}
data: [DONE]
base_url and api_key:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_MODELSLAB_API_KEY",
base_url="https://modelslab.com/api/v7/llm",
)
# Non-streaming
response = client.chat.completions.create(
model="Qwen/Qwen2.5-VL-72B-Instruct-together",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms"},
],
max_tokens=1000,
)
print(response.choices[0].message.content)
# Streaming
stream = client.chat.completions.create(
model="Qwen/Qwen2.5-VL-72B-Instruct-together",
messages=[{"role": "user", "content": "Write a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'YOUR_MODELSLAB_API_KEY',
baseURL: 'https://modelslab.com/api/v7/llm',
});
const response = await client.chat.completions.create({
model: 'Qwen/Qwen2.5-VL-72B-Instruct-together',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' },
],
});
console.log(response.choices[0].message.content);
Bearer token authentication using ModelsLab API key
Array of chat messages
Show child attributes
Model ID to use for the completion (e.g. 'Qwen/Qwen2.5-VL-72B-Instruct-together')
Maximum number of tokens to generate
x >= 1Sampling temperature (0-2). Higher values make output more random.
0 <= x <= 2Nucleus sampling parameter
0 <= x <= 1Whether to stream partial results as Server-Sent Events
Penalize new tokens based on whether they appear in the text so far
-2 <= x <= 2Penalize new tokens based on their frequency in the text so far
-2 <= x <= 2Was this page helpful?