Structured Output
Get JSON-schema-validated responses from any provider
Pass responseFormat to generateText() or streamText() to get
JSON-schema-validated responses. The SDK translates the unified shape to
each provider's native API — works the same whether you're on OpenAI,
Anthropic, Google, Azure, xAI, Together, Fireworks, OpenRouter, or Ollama.
import { generateText } from '@yourgpt/llm-sdk';
import { openai } from '@yourgpt/llm-sdk/openai';
const result = await generateText({
model: openai('gpt-4o'),
prompt: 'List the top 3 fastest land animals.',
responseFormat: {
type: 'json_schema',
json_schema: {
name: 'animals_response',
schema: {
type: 'object',
properties: {
animals: {
type: 'array',
items: {
type: 'object',
properties: {
name: { type: 'string' },
top_speed_kmh: { type: 'number' },
},
required: ['name', 'top_speed_kmh'],
},
},
},
required: ['animals'],
},
strict: true,
},
},
});
const data = JSON.parse(result.text);
// → { animals: [{ name: 'Cheetah', top_speed_kmh: 120 }, ...] }ResponseFormat shape
The unified type uses OpenAI's response_format shape — callers who already
write response_format for OpenAI can pass it through unchanged.
type ResponseFormat =
| { type: 'json_object' }
| {
type: 'json_schema';
json_schema: {
name: string;
schema: Record<string, unknown>; // JSON Schema
strict?: boolean; // default: true
};
};type: 'json_object'— free-form JSON, no schema enforcement. Adapters that don't have a native "JSON mode without schema" (Anthropic) inject a system-prompt suffix asking for JSON instead.type: 'json_schema'— schema-validated output. Recommended.
Per-provider translation
Each adapter translates responseFormat to its provider's native field:
| Provider | Native field |
|---|---|
| OpenAI Chat / Azure / xAI / Together / Fireworks / OpenRouter | response_format |
| OpenAI Responses API | text.format (different shape) |
| Anthropic Claude 3.5+ | output_config.format |
| Google Gemini | responseJsonSchema |
| Ollama 0.5+ | format |
You don't need to think about this — the SDK handles it. The notes below matter only if you hit edge cases.
Provider gotchas
Anthropic — schema sanitization
Anthropic's structured-output schema subset is narrower than OpenAI's. The adapter automatically strips keys Anthropic rejects so your call doesn't 400:
- Stripped:
minimum,maximum,exclusiveMinimum,exclusiveMaximum,multipleOf,minLength,maxLength,minItems,maxItems,minProperties,maxProperties,pattern,$schema - Converted:
oneOf→anyOf(Anthropic accepts the latter, not the former) - Forced:
additionalProperties: falseon every object
If you rely on numeric or length constraints for validation, do that
client-side after JSON.parse() rather than encoding it in the schema.
Anthropic's output_config.format is GA on Claude API and AWS Bedrock for
Claude 3.5 / 3.7 / 4 series. It is NOT available on Google Vertex AI. Older
Claude 3 base models (claude-3-opus-20240229 etc.) are not supported either.
Tracking: issue #96.
Google Gemini — OpenAPI subset
Gemini's responseJsonSchema accepts an OpenAPI 3.0 subset. The adapter
strips keys Gemini doesn't recognize:
- Stripped:
oneOf,anyOf,$ref,$defs,definitions,pattern,$schema,additionalProperties
Schemas with discriminated unions or shared definitions need to be inlined before passing to Gemini.
xAI — additionalProperties default
xAI inverts OpenAI's default: additionalProperties defaults to false and
must be explicitly set true if you want extra properties allowed. The
adapter passes your schema through unchanged, so be explicit.
Ollama — local only
Ollama's format field requires Ollama v0.5+ for schema-constrained output
(string "json" works on older versions for free-form JSON). Ollama Cloud
does not support structured outputs at the time of writing.
Capability gate
Each model in the registry carries a supportsJsonMode capability flag.
When you pass responseFormat to a model that doesn't support it, the SDK
logs a warning:
[llm-sdk] anthropic/claude-3-haiku-20240307 does not support structured
output (responseFormat); the request will be sent but the provider may
ignore it.This is a warning, not an error — the request still goes through. Switch
to a supported model (e.g. claude-3-5-sonnet-latest) or open an issue if
you need fallback behavior.
Reasoning models — token semantics
For OpenAI reasoning models (o1, o3, o4, gpt-5.x):
maxTokensis internally translated tomax_completion_tokenstemperatureis silently dropped (these models reject it)max_completion_tokensincludes BOTH reasoning tokens AND visible output tokens — set generously (maxTokens: 4000+) or you may see truncated responses
const result = await generateText({
model: openai('o3-mini'),
prompt: 'Solve: ...',
maxTokens: 4000, // → max_completion_tokens internally
temperature: 0.7, // → silently dropped
responseFormat: { ... },
});Fallback chains
responseFormat works through fallback chains transparently. Each provider
in the chain receives the schema in its native format:
import { createFallbackChain } from '@yourgpt/llm-sdk/fallback';
import { createOpenAI } from '@yourgpt/llm-sdk/openai';
import { createAnthropic } from '@yourgpt/llm-sdk/anthropic';
const chain = createFallbackChain({
models: [
openai.languageModel('gpt-4o'),
anthropic.languageModel('claude-3-5-sonnet-latest'),
],
strategy: 'priority',
});
// Same responseFormat works on either hop
const result = await chain.chat({
messages: [...],
config: {
responseFormat: { type: 'json_schema', json_schema: { ... } },
},
});A working end-to-end demo lives in examples/fallback-demo — see the
/chat/structured route.
MCP servers
Pass mcpServers alongside responseFormat to let the model call remote
Model Context Protocol servers as part of the same call. The SDK forwards
your servers to providers that support MCP natively (OpenAI Responses API,
Anthropic Messages with mcp-client-2025-11-20 beta) — no local execution
or tool wiring required.
const result = await runtime.response({
prompt: 'Extract FAQs from this conversation. Consult the KB first.',
mcpServers: [{
label: 'knowledge_base',
url: 'https://kb.example.com/sse',
headers: { Authorization: `Bearer ${token}` },
allowedTools: ['search_kb'],
requireApproval: 'never',
}],
responseFormat: {
type: 'json_schema',
json_schema: { name: 'faqs', schema: FAQ_SCHEMA, strict: true },
},
});McpServerConfig shape
interface McpServerConfig {
label: string; // human-readable, also the MCP server name
url: string; // HTTP/SSE endpoint
headers?: Record<string, string>; // forwarded as request headers
allowedTools?: string[]; // restrict the model to a subset
requireApproval?: 'never' | 'always'; // default 'never'
}Per-provider translation
| Provider | Native field | Beta header |
|---|---|---|
| OpenAI Responses | tools[type=mcp] | — |
| Anthropic Claude | mcp_servers + tools[type=mcp_toolset] for allowedTools | mcp-client-2025-11-20 |
| Google / xAI / Fireworks / OpenRouter / Ollama | not supported | — |
The Authorization header is hoisted to Anthropic's top-level
authorization_token with the Bearer prefix stripped — Anthropic
expects the bare token. Other headers are forwarded verbatim where the
provider accepts them.
Providers without native MCP throw a clear error today. Use them inside a
fallback chain (retryableErrors: () => true) and the chain will skip
past them to the next provider. Local-execution MCP fallback for
Google/xAI/Fireworks/OpenRouter is planned as a follow-up.
Reasoning effort
Pass reasoningEffort for one normalized knob across providers that
support extended thinking / reasoning. The SDK maps to each provider's
native shape — effort on OpenAI Responses, thinking.adaptive on
Claude 4.6/4.7, thinking.budget_tokens on older Claude.
const result = await runtime.response({
prompt: 'Plan the data migration step by step.',
reasoningEffort: 'high',
});Type
type ReasoningEffort =
| 'minimal' | 'low' | 'medium' | 'high' // normalized enum
| { budgetTokens: number } // explicit Anthropic / Gemini budget
| { raw: Record<string, unknown> }; // provider passthrough escape hatchPer-provider translation
| Provider | Native field | Notes |
|---|---|---|
| OpenAI Responses | reasoning: { effort, summary: 'auto' } | minimal/low/medium/high (plus xhigh/none if passed via { raw }) |
| Anthropic Claude 4.6 / 4.7 | thinking: { type: 'adaptive', effort } | Preferred for new models |
| Anthropic Claude 3.7 / 4.0–4.5 | thinking: { type: 'enabled', budget_tokens } | Buckets: low=4k, medium=8k, high=16k |
| Google Gemini 2.5/3.x thinking models | thinkingConfig.thinkingBudget | Buckets: 1k / 4k / 8k / 24k |
| xAI Grok | reasoning_effort | Coerced to low/high only |
| Others | silently ignored | The model still runs, just without extended reasoning |
When to use budgetTokens vs the enum
The enum is the right default. Use { budgetTokens: N } only when you
need an exact token count — typically for cost control on Anthropic
older models or Gemini. The escape hatch { raw: {...} } forwards your
object literally to the provider with no translation.
runtime.response() — one-shot bundle
For the common "prompt in, JSON out" case use the convenience method:
const result = await runtime.response({
prompt,
mcpServers: [...],
reasoningEffort: 'high',
responseFormat: { type: 'json_schema', json_schema: {...} },
});
const data = JSON.parse(result.text);It's a thin wrapper over runtime.generate() that builds the single
user-message and forwards the request config. See
runtime.response() for the full reference.
Next Steps
- generateText() — full text generation API
- streamText() — streaming variant
- runtime.response() — one-shot MCP + reasoning + schema
- Tools — function calling (orthogonal to structured output)