llama-server. Gateway-level errors are returned in a consistent JSON shape.
Error Response Format
This shape applies to errors raised by the gateway (auth, routing, readiness). Errors raised by the model server itself (for example an invalid sampling parameter) are passed through from
llama-server and may use the standard OpenAI/Anthropic error shape.HTTP Status Codes
401 - Unauthorized
401 - Unauthorized
404 - Not Found
404 - Not Found
Cause: There is no deployment for this Solution: Check the
deployment_id, or it does not belong to your account.Example Response:deployment_id in your base URL against the one on your deployment dashboard, and confirm the API key belongs to the same account that owns the deployment.503 - Service Unavailable
503 - Service Unavailable
502 - Bad Gateway
502 - Bad Gateway
Cause: The deployment is marked ready but the model pod is currently unreachable — typically restarting or warming up after a config change.Example Response:Solution: This is transient. Retry shortly, ideally with exponential backoff. If it persists, check the deployment status on the dashboard.
Handling Errors
Treat503 and 502 as retryable — the endpoint is coming up or restarting. Retry with backoff and stop on 401/404, which require a fix on your side.
Related
Authentication
Base URLs and the accepted auth headers.
API Error Codes
Error reference for the rest of the ModelsLab API.

