> ## Documentation Index > Fetch the complete documentation index at: https://docs.modelslab.com/llms.txt > Use this file to discover all available pages before exploring further. # Messages > Anthropic-compatible messages endpoint served by your dedicated GGUF Cloud deployment. Works with the Anthropic SDK and Claude Code. Your GGUF Cloud deployment also speaks the **Anthropic protocol** natively via the `/v1/messages` endpoint. Point the Anthropic SDK or **Claude Code** at your deployment's base URL — the **root** URL, with no `/v1` suffix, since the Anthropic SDK appends `/v1/messages` itself. ## Request ```bash theme={null} POST https://modelslab.com/api/gguf/{deployment_id}/v1/messages ``` Pass your ModelsLab API key in the `x-api-key` header (see [Authentication](/gguf-cloud/authentication)). ```bash theme={null} curl -X POST "https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1/messages" \ -H "x-api-key: $MODELSLAB_API_KEY" \ -H "Content-Type: application/json" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "local", "max_tokens": 1024, "messages": [ {"role": "user", "content": "What is the capital of France?"} ] }' ``` ## Body ```json theme={null} { "model": "local", "max_tokens": 1024, "messages": [ {"role": "user", "content": "What is the capital of France?"} ], "system": "You are a helpful assistant.", "temperature": 0.7, "stream": false } ``` ## Body Attributes The model to use. A deployment serves a **single** model, so this can be `"local"` or the model id you deployed — it is always routed to your deployment's model. Maximum number of tokens to generate. Required in the Anthropic format. Input messages. Roles are `user` and `assistant` only — the system prompt goes in the `system` parameter, not in `messages`. System prompt, passed separately from the message list. Sampling temperature. In the Anthropic format the range is `0.0`–`1.0`. Nucleus sampling threshold. Range: `0.0`–`1.0`. Custom sequences where generation stops. When `true`, responses are streamed as Server-Sent Events (`text/event-stream`). ## Response ```json theme={null} { "id": "msg_abc123", "type": "message", "role": "assistant", "content": [ { "type": "text", "text": "The capital of France is Paris." } ], "model": "local", "stop_reason": "end_turn", "usage": { "input_tokens": 15, "output_tokens": 8 } } ``` ## Response Fields Unique identifier for the message. The object type, `message`. The generated content blocks. Text responses contain `{"type": "text", "text": "..."}`. Why generation stopped, e.g. `end_turn` or `max_tokens`. Token accounting: `input_tokens` and `output_tokens`. ## Streaming Set `"stream": true` to receive Server-Sent Events: ```bash theme={null} curl -X POST "https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1/messages" \ -H "x-api-key: $MODELSLAB_API_KEY" \ -H "Content-Type: application/json" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "local", "max_tokens": 1024, "messages": [{"role": "user", "content": "Write a haiku"}], "stream": true }' ``` ## Anthropic SDK This endpoint is a drop-in replacement for the Anthropic API. Use the deployment **root** as the `base_url` (the SDK adds `/v1/messages`): ```python Python theme={null} from anthropic import Anthropic client = Anthropic( api_key="YOUR_MODELSLAB_API_KEY", base_url="https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID", ) # Non-streaming message = client.messages.create( model="local", max_tokens=1024, messages=[{"role": "user", "content": "Explain quantum computing"}], ) print(message.content[0].text) # Streaming with client.messages.stream( model="local", max_tokens=1024, messages=[{"role": "user", "content": "Write a story"}], ) as stream: for text in stream.text_stream: print(text, end="") ``` ```javascript JavaScript theme={null} import Anthropic from '@anthropic-ai/sdk'; const client = new Anthropic({ apiKey: 'YOUR_MODELSLAB_API_KEY', baseURL: 'https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID', }); const message = await client.messages.create({ model: 'local', max_tokens: 1024, messages: [{ role: 'user', content: 'Hello!' }], }); console.log(message.content[0].text); ``` ```bash cURL theme={null} curl -X POST "https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1/messages" \ -H "x-api-key: $MODELSLAB_API_KEY" \ -H "Content-Type: application/json" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "local", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## Using with Claude Code Because your deployment speaks the Anthropic protocol, you can use it as a backend for [Claude Code](https://claude.ai/claude-code). Point Claude Code at your deployment's root base URL and authenticate with your ModelsLab API key: ```bash theme={null} ANTHROPIC_BASE_URL="https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID" \ ANTHROPIC_AUTH_TOKEN="YOUR_MODELSLAB_API_KEY" \ claude --model "local" ``` Claude Code sends the API key as `x-api-key`, which the GGUF Cloud gateway accepts. Your deployment serves a single model, so the `--model` value is routed to that model regardless of the name you pass.