Messages

Your GGUF Cloud deployment also speaks the Anthropic protocol natively via the /v1/messages endpoint. Point the Anthropic SDK or Claude Code at your deployment’s base URL — the root URL, with no /v1 suffix, since the Anthropic SDK appends /v1/messages itself.

Request

POST https://modelslab.com/api/gguf/{deployment_id}/v1/messages

Pass your ModelsLab API key in the x-api-key header (see Authentication).

curl -X POST "https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1/messages" \
  -H "x-api-key: $MODELSLAB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "local",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Body

{
  "model": "local",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "system": "You are a helpful assistant.",
  "temperature": 0.7,
  "stream": false
}

Body Attributes

model

string

default:"local"

The model to use. A deployment serves a single model, so this can be "local" or the model id you deployed — it is always routed to your deployment’s model.

max_tokens

integer

required

Maximum number of tokens to generate. Required in the Anthropic format.

messages

array

required

Input messages. Roles are user and assistant only — the system prompt goes in the system parameter, not in messages.

system

string

System prompt, passed separately from the message list.

temperature

number

Sampling temperature. In the Anthropic format the range is 0.0–1.0.

top_p

number

Nucleus sampling threshold. Range: 0.0–1.0.

stop_sequences

array

Custom sequences where generation stops.

stream

boolean

default:"false"

When true, responses are streamed as Server-Sent Events (text/event-stream).

Response

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "model": "local",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 8
  }
}

Response Fields

string

Unique identifier for the message.

type

string

The object type, message.

content

array

The generated content blocks. Text responses contain {"type": "text", "text": "..."}.

stop_reason

string

Why generation stopped, e.g. end_turn or max_tokens.

usage

object

Token accounting: input_tokens and output_tokens.

Streaming

Set "stream": true to receive Server-Sent Events:

curl -X POST "https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1/messages" \
  -H "x-api-key: $MODELSLAB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "local",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Write a haiku"}],
    "stream": true
  }'

Anthropic SDK

This endpoint is a drop-in replacement for the Anthropic API. Use the deployment root as the base_url (the SDK adds /v1/messages):

from anthropic import Anthropic

client = Anthropic(
    api_key="YOUR_MODELSLAB_API_KEY",
    base_url="https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID",
)

# Non-streaming
message = client.messages.create(
    model="local",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing"}],
)
print(message.content[0].text)

# Streaming
with client.messages.stream(
    model="local",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a story"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'YOUR_MODELSLAB_API_KEY',
  baseURL: 'https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID',
});

const message = await client.messages.create({
  model: 'local',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(message.content[0].text);

curl -X POST "https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1/messages" \
  -H "x-api-key: $MODELSLAB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "local",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Using with Claude Code

Because your deployment speaks the Anthropic protocol, you can use it as a backend for Claude Code. Point Claude Code at your deployment’s root base URL and authenticate with your ModelsLab API key:

ANTHROPIC_BASE_URL="https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID" \
ANTHROPIC_AUTH_TOKEN="YOUR_MODELSLAB_API_KEY" \
claude --model "local"

Claude Code sends the API key as x-api-key, which the GGUF Cloud gateway accepts. Your deployment serves a single model, so the --model value is routed to that model regardless of the name you pass.

Using the APIs

Our AI APIs

Request

Body

Body Attributes

Response

Response Fields

Streaming

Anthropic SDK

Using with Claude Code

​Request

​Body

​Body Attributes

​Response

​Response Fields

​Streaming

​Anthropic SDK

​Using with Claude Code

Request

Body

Body Attributes

Response

Response Fields

Streaming

Anthropic SDK

Using with Claude Code