Skip to main content
Your GGUF Cloud deployment also speaks the Anthropic protocol natively via the /v1/messages endpoint. Point the Anthropic SDK or Claude Code at your deployment’s base URL — the root URL, with no /v1 suffix, since the Anthropic SDK appends /v1/messages itself.

Request

POST https://modelslab.com/api/gguf/{deployment_id}/v1/messages
Pass your ModelsLab API key in the x-api-key header (see Authentication).
curl -X POST "https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1/messages" \
  -H "x-api-key: $MODELSLAB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "local",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Body

{
  "model": "local",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "system": "You are a helpful assistant.",
  "temperature": 0.7,
  "stream": false
}

Body Attributes

model
string
default:"local"
The model to use. A deployment serves a single model, so this can be "local" or the model id you deployed — it is always routed to your deployment’s model.
max_tokens
integer
required
Maximum number of tokens to generate. Required in the Anthropic format.
messages
array
required
Input messages. Roles are user and assistant only — the system prompt goes in the system parameter, not in messages.
system
string
System prompt, passed separately from the message list.
temperature
number
Sampling temperature. In the Anthropic format the range is 0.01.0.
top_p
number
Nucleus sampling threshold. Range: 0.01.0.
stop_sequences
array
Custom sequences where generation stops.
stream
boolean
default:"false"
When true, responses are streamed as Server-Sent Events (text/event-stream).

Response

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "model": "local",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 8
  }
}

Response Fields

id
string
Unique identifier for the message.
type
string
The object type, message.
content
array
The generated content blocks. Text responses contain {"type": "text", "text": "..."}.
stop_reason
string
Why generation stopped, e.g. end_turn or max_tokens.
usage
object
Token accounting: input_tokens and output_tokens.

Streaming

Set "stream": true to receive Server-Sent Events:
curl -X POST "https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1/messages" \
  -H "x-api-key: $MODELSLAB_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "local",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Write a haiku"}],
    "stream": true
  }'

Anthropic SDK

This endpoint is a drop-in replacement for the Anthropic API. Use the deployment root as the base_url (the SDK adds /v1/messages):
from anthropic import Anthropic

client = Anthropic(
    api_key="YOUR_MODELSLAB_API_KEY",
    base_url="https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID",
)

# Non-streaming
message = client.messages.create(
    model="local",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing"}],
)
print(message.content[0].text)

# Streaming
with client.messages.stream(
    model="local",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a story"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

Using with Claude Code

Because your deployment speaks the Anthropic protocol, you can use it as a backend for Claude Code. Point Claude Code at your deployment’s root base URL and authenticate with your ModelsLab API key:
ANTHROPIC_BASE_URL="https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID" \
ANTHROPIC_AUTH_TOKEN="YOUR_MODELSLAB_API_KEY" \
claude --model "local"
Claude Code sends the API key as x-api-key, which the GGUF Cloud gateway accepts. Your deployment serves a single model, so the --model value is routed to that model regardless of the name you pass.