Skip to main content

Chat Completions Parameters

These parameters are available on the Chat Completions endpoint (OpenAI-compatible).

Core Parameters

ParameterTypeRequiredDefaultDescription
modelstringYesModel ID to use. See List Models for available models.
messagesarrayYesArray of message objects with role and content.
max_tokensintegerNo1000Maximum tokens to generate. Range: 1 to model’s max context.
streambooleanNofalseEnable Server-Sent Events streaming.

Sampling Parameters

ParameterTypeRangeDefaultDescription
temperaturefloat0–21.0Controls randomness. Lower values (0.1–0.3) produce focused, deterministic output. Higher values (0.8–1.5) increase creativity and variety. Set to 0 for greedy decoding.
top_pfloat0–11.0Nucleus sampling — only consider tokens with cumulative probability above this threshold. Lower values (0.1) make output more focused. Use either temperature or top_p, not both.
top_kinteger1+Only sample from the top K most likely tokens. Lower values constrain output. Not all models support this.
Avoid setting both temperature and top_p at the same time. Use one or the other for best results.

Penalty Parameters

ParameterTypeRangeDefaultDescription
presence_penaltyfloat-2 to 20Penalizes tokens that have appeared in the text so far. Positive values encourage the model to explore new topics.
frequency_penaltyfloat-2 to 20Penalizes tokens based on how often they’ve appeared. Positive values reduce repetition proportionally to frequency.
repetition_penaltyfloat0.1–21.0Multiplicative penalty on repeated tokens. Values > 1 discourage repetition, < 1 encourage it.

Stop Sequences

ParameterTypeDefaultDescription
stopstring or arraynullUp to 4 sequences where the API will stop generating. The stop sequence is not included in the output.
{
  "stop": ["\n\n", "END", "```"]
}

Response Format

ParameterTypeDefaultDescription
response_formatobjectForce the model to output in a specific format.
seedintegerAttempt deterministic output. Same seed + same input should produce same output.
ninteger1Number of completions to generate.
JSON mode:
{
  "response_format": {"type": "json_object"}
}
When using JSON mode, you must also instruct the model to output JSON in your system or user message, e.g. “Respond in JSON format.”

Function Calling

ParameterTypeDefaultDescription
toolsarrayList of tools/functions the model can call. See Function Calling.
tool_choicestring or object"auto"Controls tool usage: "auto", "none", "required", or a specific tool.
parallel_tool_callsbooleantrueWhether the model can call multiple tools in one turn.

Messages Parameters

These parameters are available on the Messages endpoint (Anthropic-compatible).

Core Parameters

ParameterTypeRequiredDefaultDescription
modelstringYesModel ID to use.
messagesarrayYesInput messages. Roles: user and assistant only (system goes in system param).
max_tokensintegerYesMaximum tokens to generate. Required for Anthropic format.
systemstringNoSystem prompt. Passed separately, not as a message.
streambooleanNofalseEnable streaming via Server-Sent Events.

Sampling Parameters

ParameterTypeRangeDefaultDescription
temperaturefloat0–11.0Controls randomness. Note: Anthropic format caps at 1.0, not 2.0.
top_pfloat0–1Nucleus sampling threshold.
top_kinteger1+Only sample from the top K tokens.

Stop Sequences

ParameterTypeDefaultDescription
stop_sequencesarrayCustom stop sequences.

Tool Use

ParameterTypeDefaultDescription
toolsarrayTools the model can use. Uses input_schema instead of parameters. See Function Calling.
tool_choiceobject{"type": "auto"}Controls tool usage: {"type": "auto"}, {"type": "any"}, or {"type": "tool", "name": "..."}.

Common Patterns

Deterministic Output

For reproducible results, use low temperature with a seed:
{
  "temperature": 0,
  "seed": 42
}

Creative Writing

For creative, varied output:
{
  "temperature": 1.2,
  "presence_penalty": 0.6,
  "frequency_penalty": 0.3
}

Structured Extraction

For extracting structured data:
{
  "temperature": 0,
  "response_format": {"type": "json_object"},
  "max_tokens": 2000
}

Code Generation

For code generation tasks:
{
  "temperature": 0.2,
  "top_p": 0.95,
  "stop": ["\n\n\n", "```"]
}

Conversational

For natural, engaging conversations:
{
  "temperature": 0.8,
  "presence_penalty": 0.5,
  "max_tokens": 500
}