Skip to main content
POST
/
v1
/
messages
Messages
curl --request POST \
  --url https://modelslab.com/api/v7/llm/v1/messages \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "model": "<string>",
  "max_tokens": 2,
  "messages": [
    {
      "role": "user",
      "content": "<string>"
    }
  ],
  "system": "<string>",
  "temperature": 1,
  "top_p": 0.5,
  "stream": false
}
'
{
  "id": "<string>",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "<string>"
    }
  ],
  "model": "<string>",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123
  }
}

Request

POST https://modelslab.com/api/v7/llm/v1/messages
Pass your API key in the x-api-key header or as a Bearer token.
curl -X POST https://modelslab.com/api/v7/llm/v1/messages \
  -H "x-api-key: $MODELSLAB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Body

{
  "model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "system": "You are a helpful assistant.",
  "temperature": 0.7,
  "top_p": 1,
  "stream": false
}

Response

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 8
  }
}

Streaming

Set "stream": true to receive Server-Sent Events:
curl -X POST https://modelslab.com/api/v7/llm/v1/messages \
  -H "x-api-key: $MODELSLAB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "Qwen/Qwen2.5-VL-72B-Instruct-together",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Write a haiku"}],
    "stream": true
  }'

Anthropic SDK

This endpoint is fully compatible with the Anthropic SDK. Just change the base_url and api_key:
from anthropic import Anthropic

client = Anthropic(
    api_key="YOUR_MODELSLAB_API_KEY",
    base_url="https://modelslab.com/api/v7/llm",
)

# Non-streaming
message = client.messages.create(
    model="Qwen/Qwen2.5-VL-72B-Instruct-together",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
)
print(message.content[0].text)

# Streaming
with client.messages.stream(
    model="Qwen/Qwen2.5-VL-72B-Instruct-together",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a story"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

Using with Claude Code

You can use ModelsLab’s LLM API as a backend for Claude Code, Anthropic’s CLI coding assistant:
ANTHROPIC_BASE_URL="https://modelslab.com/api/v7/llm" \
ANTHROPIC_AUTH_TOKEN="YOUR_MODELSLAB_API_KEY" \
claude --model "Qwen/Qwen2.5-VL-72B-Instruct-together"
This lets you use any of ModelsLab’s 200+ LLM models as the backend for Claude Code’s agentic coding capabilities.

Authorizations

x-api-key
string
header
required

API key authentication via x-api-key header

Headers

anthropic-version
string
default:2023-06-01

Anthropic API version

Body

application/json
model
string
required

Model ID to use

max_tokens
integer
required

Maximum number of tokens to generate

Required range: x >= 1
messages
object[]
required

Array of input messages

system
string

System prompt

temperature
number
default:1

Sampling temperature

Required range: 0 <= x <= 1
top_p
number

Nucleus sampling parameter

Required range: 0 <= x <= 1
stream
boolean
default:false

Whether to stream the response

Response

Message response

id
string

Unique message ID

type
enum<string>
Available options:
message
role
enum<string>
Available options:
assistant
content
object[]
model
string
stop_reason
enum<string>
Available options:
end_turn,
max_tokens,
stop_sequence
usage
object