> ## Documentation Index > Fetch the complete documentation index at: https://docs.modelslab.com/llms.txt > Use this file to discover all available pages before exploring further. # GGUF Cloud > Deploy any GGUF / llama.cpp model on a dedicated single-tenant GPU. Each deployment is its own private endpoint that speaks both the OpenAI and Anthropic protocols natively. **GGUF Cloud** lets you deploy any GGUF (llama.cpp) model on a **dedicated, single-tenant GPU**. Each deployment becomes its own private API endpoint, backed by [`llama.cpp`](https://github.com/ggml-org/llama.cpp) (`llama-server`), and speaks **both the OpenAI and Anthropic protocols natively** — so you can point the OpenAI SDK, the Anthropic SDK, **Claude Code**, or any compatible client at it with just a base URL change. Unlike the shared [LLM API](/llm-api/overview), a GGUF Cloud deployment runs **only your model on your own GPU**. There are no neighbors, the model stays loaded, and the endpoint is reachable exclusively with your ModelsLab API key. ## How it works Choose any GGUF model (from Hugging Face, your own quantization, or one of our presets) at [modelslab.com/gguf-cloud](https://modelslab.com/gguf-cloud). We provision a single-tenant GPU pod running `llama-server` with your model loaded. Your deployment gets a unique `deployment_id`. Point the OpenAI SDK, Anthropic SDK, or Claude Code at your deployment's base URL using your existing ModelsLab API key. Get a deployment and find your `deployment_id` on the dashboard at [modelslab.com/gguf-cloud](https://modelslab.com/gguf-cloud). ## Base URL Every deployment has its own base URL. The `{deployment_id}` is shown on the deployment's dashboard page: ``` https://modelslab.com/api/gguf/{deployment_id} ``` * **OpenAI SDKs** use the base URL with `/v1` appended → `https://modelslab.com/api/gguf/{deployment_id}/v1` * **Anthropic SDKs / Claude Code** use the base URL as-is → `https://modelslab.com/api/gguf/{deployment_id}` See [Authentication](/gguf-cloud/authentication) for the full details on base URLs and API keys. ## Quickstart Authenticate with your existing ModelsLab API key. Because the deployment serves a single model, the `model` field can be `"local"` (or the model id you deployed) — it's always routed to your deployment's model. ```python Python (OpenAI SDK) theme={null} from openai import OpenAI client = OpenAI( api_key="YOUR_MODELSLAB_API_KEY", base_url="https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1", ) response = client.chat.completions.create( model="local", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content) ``` ```python Python (Anthropic SDK) theme={null} from anthropic import Anthropic client = Anthropic( api_key="YOUR_MODELSLAB_API_KEY", base_url="https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID", ) message = client.messages.create( model="local", max_tokens=1024, messages=[{"role": "user", "content": "Hello!"}], ) print(message.content[0].text) ``` ```bash cURL theme={null} curl -X POST "https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1/chat/completions" \ -H "Authorization: Bearer $MODELSLAB_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "local", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## Explore Base URLs, your `deployment_id`, and the three accepted auth headers. OpenAI-compatible `/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`, and `/v1/models`. Anthropic-compatible `/v1/messages`. Works with the Anthropic SDK and Claude Code. Gateway error reference: 401, 404, 503, and 502 and how to handle them.