Structured Output

Get JSON-schema-validated responses from any provider

Pass responseFormat to generateText() or streamText() to get JSON-schema-validated responses. The SDK translates the unified shape to each provider's native API — works the same whether you're on OpenAI, Anthropic, Google, Azure, xAI, Together, Fireworks, OpenRouter, or Ollama.

import { generateText } from '@yourgpt/llm-sdk';
import { openai } from '@yourgpt/llm-sdk/openai';

const result = await generateText({
  model: openai('gpt-4o'),
  prompt: 'List the top 3 fastest land animals.',
  responseFormat: {
    type: 'json_schema',
    json_schema: {
      name: 'animals_response',
      schema: {
        type: 'object',
        properties: {
          animals: {
            type: 'array',
            items: {
              type: 'object',
              properties: {
                name: { type: 'string' },
                top_speed_kmh: { type: 'number' },
              },
              required: ['name', 'top_speed_kmh'],
            },
          },
        },
        required: ['animals'],
      },
      strict: true,
    },
  },
});

const data = JSON.parse(result.text);
// → { animals: [{ name: 'Cheetah', top_speed_kmh: 120 }, ...] }

ResponseFormat shape

The unified type uses OpenAI's response_format shape — callers who already write response_format for OpenAI can pass it through unchanged.

type ResponseFormat =
  | { type: 'json_object' }
  | {
      type: 'json_schema';
      json_schema: {
        name: string;
        schema: Record<string, unknown>; // JSON Schema
        strict?: boolean;                // default: true
      };
    };
  • type: 'json_object' — free-form JSON, no schema enforcement. Adapters that don't have a native "JSON mode without schema" (Anthropic) inject a system-prompt suffix asking for JSON instead.
  • type: 'json_schema' — schema-validated output. Recommended.

Per-provider translation

Each adapter translates responseFormat to its provider's native field:

ProviderNative field
OpenAI Chat / Azure / xAI / Together / Fireworks / OpenRouterresponse_format
OpenAI Responses APItext.format (different shape)
Anthropic Claude 3.5+output_config.format
Google GeminiresponseJsonSchema
Ollama 0.5+format

You don't need to think about this — the SDK handles it. The notes below matter only if you hit edge cases.


Provider gotchas

Anthropic — schema sanitization

Anthropic's structured-output schema subset is narrower than OpenAI's. The adapter automatically strips keys Anthropic rejects so your call doesn't 400:

  • Stripped: minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf, minLength, maxLength, minItems, maxItems, minProperties, maxProperties, pattern, $schema
  • Converted: oneOfanyOf (Anthropic accepts the latter, not the former)
  • Forced: additionalProperties: false on every object

If you rely on numeric or length constraints for validation, do that client-side after JSON.parse() rather than encoding it in the schema.

Anthropic's output_config.format is GA on Claude API and AWS Bedrock for Claude 3.5 / 3.7 / 4 series. It is NOT available on Google Vertex AI. Older Claude 3 base models (claude-3-opus-20240229 etc.) are not supported either. Tracking: issue #96.

Google Gemini — OpenAPI subset

Gemini's responseJsonSchema accepts an OpenAPI 3.0 subset. The adapter strips keys Gemini doesn't recognize:

  • Stripped: oneOf, anyOf, $ref, $defs, definitions, pattern, $schema, additionalProperties

Schemas with discriminated unions or shared definitions need to be inlined before passing to Gemini.

xAI — additionalProperties default

xAI inverts OpenAI's default: additionalProperties defaults to false and must be explicitly set true if you want extra properties allowed. The adapter passes your schema through unchanged, so be explicit.

Ollama — local only

Ollama's format field requires Ollama v0.5+ for schema-constrained output (string "json" works on older versions for free-form JSON). Ollama Cloud does not support structured outputs at the time of writing.


Capability gate

Each model in the registry carries a supportsJsonMode capability flag. When you pass responseFormat to a model that doesn't support it, the SDK logs a warning:

[llm-sdk] anthropic/claude-3-haiku-20240307 does not support structured
output (responseFormat); the request will be sent but the provider may
ignore it.

This is a warning, not an error — the request still goes through. Switch to a supported model (e.g. claude-3-5-sonnet-latest) or open an issue if you need fallback behavior.


Reasoning models — token semantics

For OpenAI reasoning models (o1, o3, o4, gpt-5.x):

  • maxTokens is internally translated to max_completion_tokens
  • temperature is silently dropped (these models reject it)
  • max_completion_tokens includes BOTH reasoning tokens AND visible output tokens — set generously (maxTokens: 4000+) or you may see truncated responses
const result = await generateText({
  model: openai('o3-mini'),
  prompt: 'Solve: ...',
  maxTokens: 4000,           // → max_completion_tokens internally
  temperature: 0.7,          // → silently dropped
  responseFormat: { ... },
});

Fallback chains

responseFormat works through fallback chains transparently. Each provider in the chain receives the schema in its native format:

import { createFallbackChain } from '@yourgpt/llm-sdk/fallback';
import { createOpenAI } from '@yourgpt/llm-sdk/openai';
import { createAnthropic } from '@yourgpt/llm-sdk/anthropic';

const chain = createFallbackChain({
  models: [
    openai.languageModel('gpt-4o'),
    anthropic.languageModel('claude-3-5-sonnet-latest'),
  ],
  strategy: 'priority',
});

// Same responseFormat works on either hop
const result = await chain.chat({
  messages: [...],
  config: {
    responseFormat: { type: 'json_schema', json_schema: { ... } },
  },
});

A working end-to-end demo lives in examples/fallback-demo — see the /chat/structured route.


MCP servers

Pass mcpServers alongside responseFormat to let the model call remote Model Context Protocol servers as part of the same call. The SDK forwards your servers to providers that support MCP natively (OpenAI Responses API, Anthropic Messages with mcp-client-2025-11-20 beta) — no local execution or tool wiring required.

const result = await runtime.response({
  prompt: 'Extract FAQs from this conversation. Consult the KB first.',
  mcpServers: [{
    label: 'knowledge_base',
    url: 'https://kb.example.com/sse',
    headers: { Authorization: `Bearer ${token}` },
    allowedTools: ['search_kb'],
    requireApproval: 'never',
  }],
  responseFormat: {
    type: 'json_schema',
    json_schema: { name: 'faqs', schema: FAQ_SCHEMA, strict: true },
  },
});

McpServerConfig shape

interface McpServerConfig {
  label: string;                          // human-readable, also the MCP server name
  url: string;                            // HTTP/SSE endpoint
  headers?: Record<string, string>;       // forwarded as request headers
  allowedTools?: string[];                // restrict the model to a subset
  requireApproval?: 'never' | 'always';   // default 'never'
}

Per-provider translation

ProviderNative fieldBeta header
OpenAI Responsestools[type=mcp]
Anthropic Claudemcp_servers + tools[type=mcp_toolset] for allowedToolsmcp-client-2025-11-20
Google / xAI / Fireworks / OpenRouter / Ollamanot supported

The Authorization header is hoisted to Anthropic's top-level authorization_token with the Bearer prefix stripped — Anthropic expects the bare token. Other headers are forwarded verbatim where the provider accepts them.

Providers without native MCP throw a clear error today. Use them inside a fallback chain (retryableErrors: () => true) and the chain will skip past them to the next provider. Local-execution MCP fallback for Google/xAI/Fireworks/OpenRouter is planned as a follow-up.


Reasoning effort

Pass reasoningEffort for one normalized knob across providers that support extended thinking / reasoning. The SDK maps to each provider's native shape — effort on OpenAI Responses, thinking.adaptive on Claude 4.6/4.7, thinking.budget_tokens on older Claude.

const result = await runtime.response({
  prompt: 'Plan the data migration step by step.',
  reasoningEffort: 'high',
});

Type

type ReasoningEffort =
  | 'minimal' | 'low' | 'medium' | 'high'  // normalized enum
  | { budgetTokens: number }                // explicit Anthropic / Gemini budget
  | { raw: Record<string, unknown> };       // provider passthrough escape hatch

Per-provider translation

ProviderNative fieldNotes
OpenAI Responsesreasoning: { effort, summary: 'auto' }minimal/low/medium/high (plus xhigh/none if passed via { raw })
Anthropic Claude 4.6 / 4.7thinking: { type: 'adaptive', effort }Preferred for new models
Anthropic Claude 3.7 / 4.0–4.5thinking: { type: 'enabled', budget_tokens }Buckets: low=4k, medium=8k, high=16k
Google Gemini 2.5/3.x thinking modelsthinkingConfig.thinkingBudgetBuckets: 1k / 4k / 8k / 24k
xAI Grokreasoning_effortCoerced to low/high only
Otherssilently ignoredThe model still runs, just without extended reasoning

When to use budgetTokens vs the enum

The enum is the right default. Use { budgetTokens: N } only when you need an exact token count — typically for cost control on Anthropic older models or Gemini. The escape hatch { raw: {...} } forwards your object literally to the provider with no translation.


runtime.response() — one-shot bundle

For the common "prompt in, JSON out" case use the convenience method:

const result = await runtime.response({
  prompt,
  mcpServers: [...],
  reasoningEffort: 'high',
  responseFormat: { type: 'json_schema', json_schema: {...} },
});
const data = JSON.parse(result.text);

It's a thin wrapper over runtime.generate() that builds the single user-message and forwards the request config. See runtime.response() for the full reference.


Next Steps

On this page