/v1/messages endpoint. Point the Anthropic SDK or Claude Code at your deployment’s base URL — the root URL, with no /v1 suffix, since the Anthropic SDK appends /v1/messages itself.
Request
x-api-key header (see Authentication).
Body
Body Attributes
The model to use. A deployment serves a single model, so this can be
"local" or the model id you deployed — it is always routed to your deployment’s model.Maximum number of tokens to generate. Required in the Anthropic format.
Input messages. Roles are
user and assistant only — the system prompt goes in the system parameter, not in messages.System prompt, passed separately from the message list.
Sampling temperature. In the Anthropic format the range is
0.0–1.0.Nucleus sampling threshold. Range:
0.0–1.0.Custom sequences where generation stops.
When
true, responses are streamed as Server-Sent Events (text/event-stream).Response
Response Fields
Unique identifier for the message.
The object type,
message.The generated content blocks. Text responses contain
{"type": "text", "text": "..."}.Why generation stopped, e.g.
end_turn or max_tokens.Token accounting:
input_tokens and output_tokens.Streaming
Set"stream": true to receive Server-Sent Events:
Anthropic SDK
This endpoint is a drop-in replacement for the Anthropic API. Use the deployment root as thebase_url (the SDK adds /v1/messages):
Using with Claude Code
Because your deployment speaks the Anthropic protocol, you can use it as a backend for Claude Code. Point Claude Code at your deployment’s root base URL and authenticate with your ModelsLab API key:Claude Code sends the API key as
x-api-key, which the GGUF Cloud gateway accepts. Your deployment serves a single model, so the --model value is routed to that model regardless of the name you pass.
