Structured Output

Pass responseFormat to generateText() or streamText() to get JSON-schema-validated responses. The SDK translates the unified shape to each provider's native API — works the same whether you're on OpenAI, Anthropic, Google, Azure, xAI, Together, Fireworks, OpenRouter, or Ollama.

import { generateText } from '@yourgpt/llm-sdk';
import { openai } from '@yourgpt/llm-sdk/openai';

const result = await generateText({
  model: openai('gpt-4o'),
  prompt: 'List the top 3 fastest land animals.',
  responseFormat: {
    type: 'json_schema',
    json_schema: {
      name: 'animals_response',
      schema: {
        type: 'object',
        properties: {
          animals: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                name: { type: 'string' },
                top_speed_kmh: { type: 'number' },
              },
              required: ['name', 'top_speed_kmh'],
            },
          },
        },
        required: ['animals'],
      },
      strict: true,
    },
  },
});

const data = JSON.parse(result.text);
// → { animals: [{ name: 'Cheetah', top_speed_kmh: 120 }, ...] }

ResponseFormat shape

The unified type uses OpenAI's response_format shape — callers who already write response_format for OpenAI can pass it through unchanged.

type ResponseFormat =
  | { type: 'json_object' }
  | {
      type: 'json_schema';
      json_schema: {
        name: string;
        schema: Record<string, unknown>; // JSON Schema
        strict?: boolean;                // default: true
      };
    };

type: 'json_object' — free-form JSON, no schema enforcement. Adapters that don't have a native "JSON mode without schema" (Anthropic) inject a system-prompt suffix asking for JSON instead.
type: 'json_schema' — schema-validated output. Recommended.

Per-provider translation

Each adapter translates responseFormat to its provider's native field:

Provider	Native field
OpenAI Chat / Azure / xAI / Together / Fireworks / OpenRouter	`response_format`
OpenAI Responses API	`text.format` (different shape)
Anthropic Claude 3.5+	`output_config.format`
Google Gemini	`responseJsonSchema`
Ollama 0.5+	`format`

You don't need to think about this — the SDK handles it. The notes below matter only if you hit edge cases.

Provider gotchas

Anthropic — schema sanitization

Anthropic's structured-output schema subset is narrower than OpenAI's. The adapter automatically strips keys Anthropic rejects so your call doesn't 400:

Stripped: minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf, minLength, maxLength, minItems, maxItems, minProperties, maxProperties, pattern, $schema
Converted: oneOf → anyOf (Anthropic accepts the latter, not the former)
Forced: additionalProperties: false on every object

If you rely on numeric or length constraints for validation, do that client-side after JSON.parse() rather than encoding it in the schema.

Anthropic's output_config.format is GA on Claude API and AWS Bedrock for Claude 3.5 / 3.7 / 4 series. It is NOT available on Google Vertex AI. Older Claude 3 base models (claude-3-opus-20240229 etc.) are not supported either. Tracking: issue #96.

Google Gemini — OpenAPI subset

Gemini's responseJsonSchema accepts an OpenAPI 3.0 subset. The adapter strips keys Gemini doesn't recognize:

Stripped: oneOf, anyOf, $ref, $defs, definitions, pattern, $schema, additionalProperties

Schemas with discriminated unions or shared definitions need to be inlined before passing to Gemini.

xAI — `additionalProperties` default

xAI inverts OpenAI's default: additionalProperties defaults to false and must be explicitly set true if you want extra properties allowed. The adapter passes your schema through unchanged, so be explicit.

Ollama — local only

Ollama's format field requires Ollama v0.5+ for schema-constrained output (string "json" works on older versions for free-form JSON). Ollama Cloud does not support structured outputs at the time of writing.

Capability gate

Each model in the registry carries a supportsJsonMode capability flag. When you pass responseFormat to a model that doesn't support it, the SDK logs a warning:

[llm-sdk] anthropic/claude-3-haiku-20240307 does not support structured
output (responseFormat); the request will be sent but the provider may
ignore it.

This is a warning, not an error — the request still goes through. Switch to a supported model (e.g. claude-3-5-sonnet-latest) or open an issue if you need fallback behavior.

Reasoning models — token semantics

For OpenAI reasoning models (o1, o3, o4, gpt-5.x):

maxTokens is internally translated to max_completion_tokens
temperature is silently dropped (these models reject it)
max_completion_tokens includes BOTH reasoning tokens AND visible output tokens — set generously (maxTokens: 4000+) or you may see truncated responses

const result = await generateText({
  model: openai('o3-mini'),
  prompt: 'Solve: ...',
  maxTokens: 4000,           // → max_completion_tokens internally
  temperature: 0.7,          // → silently dropped
  responseFormat: { ... },
});

Fallback chains

responseFormat works through fallback chains transparently. Each provider in the chain receives the schema in its native format:

import { createFallbackChain } from '@yourgpt/llm-sdk/fallback';
import { createOpenAI } from '@yourgpt/llm-sdk/openai';
import { createAnthropic } from '@yourgpt/llm-sdk/anthropic';

const chain = createFallbackChain({
  models: [
    openai.languageModel('gpt-4o'),
    anthropic.languageModel('claude-3-5-sonnet-latest'),
  ],
  strategy: 'priority',
});

// Same responseFormat works on either hop
const result = await chain.chat({
  messages: [...],
  config: {
    responseFormat: { type: 'json_schema', json_schema: { ... } },
  },
});

A working end-to-end demo lives in examples/fallback-demo — see the /chat/structured route.

MCP servers

Pass mcpServers alongside responseFormat to let the model call remote Model Context Protocol servers as part of the same call. The SDK forwards your servers to providers that support MCP natively (OpenAI Responses API, Anthropic Messages with mcp-client-2025-11-20 beta) — no local execution or tool wiring required.

const result = await runtime.response({
  prompt: 'Extract FAQs from this conversation. Consult the KB first.',
  mcpServers: [{
    label: 'knowledge_base',
    url: 'https://kb.example.com/sse',
    headers: { Authorization: `Bearer ${token}` },
    allowedTools: ['search_kb'],
    requireApproval: 'never',
  }],
  responseFormat: {
    type: 'json_schema',
    json_schema: { name: 'faqs', schema: FAQ_SCHEMA, strict: true },
  },
});

McpServerConfig shape

interface McpServerConfig {
  label: string;                          // human-readable, also the MCP server name
  url: string;                            // HTTP/SSE endpoint
  headers?: Record<string, string>;       // forwarded as request headers
  allowedTools?: string[];                // restrict the model to a subset
  requireApproval?: 'never' | 'always';   // default 'never'
}

Per-provider translation

Provider	Native field	Beta header
OpenAI Responses	`tools[type=mcp]`	—
Anthropic Claude	`mcp_servers` + `tools[type=mcp_toolset]` for `allowedTools`	`mcp-client-2025-11-20`
Google / xAI / Fireworks / OpenRouter / Ollama	not supported	—

The Authorization header is hoisted to Anthropic's top-level authorization_token with the Bearer prefix stripped — Anthropic expects the bare token. Other headers are forwarded verbatim where the provider accepts them.

Providers without native MCP throw a clear error today. Use them inside a fallback chain (retryableErrors: () => true) and the chain will skip past them to the next provider. Local-execution MCP fallback for Google/xAI/Fireworks/OpenRouter is planned as a follow-up.

Reasoning effort

Pass reasoningEffort for one normalized knob across providers that support extended thinking / reasoning. The SDK maps to each provider's native shape — effort on OpenAI Responses, thinking.adaptive on Claude 4.6/4.7, thinking.budget_tokens on older Claude.

const result = await runtime.response({
  prompt: 'Plan the data migration step by step.',
  reasoningEffort: 'high',
});

Type

type ReasoningEffort =
  | 'minimal' | 'low' | 'medium' | 'high'  // normalized enum
  | { budgetTokens: number }                // explicit Anthropic / Gemini budget
  | { raw: Record<string, unknown> };       // provider passthrough escape hatch

Per-provider translation

Provider	Native field	Notes
OpenAI Responses	`reasoning: { effort, summary: 'auto' }`	`minimal`/`low`/`medium`/`high` (plus `xhigh`/`none` if passed via `{ raw }`)
Anthropic Claude 4.6 / 4.7	`thinking: { type: 'adaptive', effort }`	Preferred for new models
Anthropic Claude 3.7 / 4.0–4.5	`thinking: { type: 'enabled', budget_tokens }`	Buckets: low=4k, medium=8k, high=16k
Google Gemini 2.5/3.x thinking models	`thinkingConfig.thinkingBudget`	Buckets: 1k / 4k / 8k / 24k
xAI Grok	`reasoning_effort`	Coerced to `low`/`high` only
Others	silently ignored	The model still runs, just without extended reasoning

When to use `budgetTokens` vs the enum

The enum is the right default. Use { budgetTokens: N } only when you need an exact token count — typically for cost control on Anthropic older models or Gemini. The escape hatch { raw: {...} } forwards your object literally to the provider with no translation.

runtime.response() — one-shot bundle

For the common "prompt in, JSON out" case use the convenience method:

const result = await runtime.response({
  prompt,
  mcpServers: [...],
  reasoningEffort: 'high',
  responseFormat: { type: 'json_schema', json_schema: {...} },
});
const data = JSON.parse(result.text);

It's a thin wrapper over runtime.generate() that builds the single user-message and forwards the request config. See runtime.response() for the full reference.

Next Steps

generateText() — full text generation API
streamText() — streaming variant
runtime.response() — one-shot MCP + reasoning + schema
Tools — function calling (orthogonal to structured output)

Structured Output

On this page