Skip to main content
Every GGUF Cloud deployment is a private endpoint with its own base URL, authenticated with your existing ModelsLab API key. No separate key is required.

Base URL

Each deployment has a unique base URL. The {deployment_id} is shown on the deployment’s dashboard page at modelslab.com/gguf-cloud:
https://modelslab.com/api/gguf/{deployment_id}

Choosing the right base_url

The two SDK families expect the base URL in slightly different forms. Use the matching one:
SDK / Clientbase_url value
OpenAI SDK (Python, Node, LangChain, …)https://modelslab.com/api/gguf/{deployment_id}/v1
Anthropic SDK / Claude Codehttps://modelslab.com/api/gguf/{deployment_id}
The OpenAI SDK appends paths like /chat/completions to the base URL, so the base URL ends in /v1. The Anthropic SDK appends /v1/messages itself, so its base URL is the root (no trailing /v1).

API key

Authenticate with your existing ModelsLab API key. You can find or create one in your API Keys Dashboard. The gateway accepts the key in any of three headers, so SDKs from both ecosystems work unchanged:
Authorization
string
Bearer token form used by OpenAI SDKs: Authorization: Bearer YOUR_MODELSLAB_API_KEY
x-api-key
string
Header used by Anthropic SDKs and Claude Code: x-api-key: YOUR_MODELSLAB_API_KEY
key
string
Plain header form: key: YOUR_MODELSLAB_API_KEY
Never share your API key publicly or commit it to version control. Treat it like a password and use environment variables in production.

Examples

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_MODELSLAB_API_KEY",
    base_url="https://modelslab.com/api/gguf/YOUR_DEPLOYMENT_ID/v1",
)

Next steps

Chat Completions

Call the OpenAI-compatible endpoints on your deployment.

Messages

Call the Anthropic-compatible endpoint, or point Claude Code at your deployment.