runtime.response()
One-shot prompt-in / JSON-out call with MCP servers, reasoning effort, and structured output
runtime.response() is an ergonomic wrapper for the common "give me one
answer back" case — a single prompt, optional MCP servers, optional
reasoning effort, optional JSON schema, one response. No streaming, no
multi-step agent loop.
It's a thin convenience over runtime.generate(): same adapter chain,
same per-provider translation, same fallback behavior. Use it when you
don't need streaming or tool-call introspection.
import { createRuntime, createFallbackChain } from '@yourgpt/llm-sdk';
import { createOpenAI } from '@yourgpt/llm-sdk/openai';
import { createAnthropic } from '@yourgpt/llm-sdk/anthropic';
const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY });
const anthropic = createAnthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const runtime = createRuntime({
adapter: createFallbackChain({
models: [
openai.languageModel('gpt-4o'),
anthropic.languageModel('claude-opus-4-7'),
],
strategy: 'priority',
}),
});
const result = await runtime.response({
prompt: 'Extract the top 3 fastest land animals as JSON.',
reasoningEffort: 'medium',
responseFormat: {
type: 'json_schema',
json_schema: {
name: 'animals',
schema: {
type: 'object',
properties: {
animals: {
type: 'array',
items: {
type: 'object',
properties: {
name: { type: 'string' },
top_speed_kmh: { type: 'number' },
},
required: ['name', 'top_speed_kmh'],
},
},
},
required: ['animals'],
},
strict: true,
},
},
});
const data = JSON.parse(result.text);
// → { animals: [{ name: 'Cheetah', top_speed_kmh: 120 }, ...] }Parameters
| Param | Type | Description |
|---|---|---|
prompt | string | The user message. Sent as a single role: 'user' turn. |
systemPrompt | string? | Optional system prompt (overrides any default on the runtime). |
mcpServers | McpServerConfig[]? | Remote MCP servers the model can call. See Structured Output → MCP servers. |
reasoningEffort | ReasoningEffort? | 'minimal'/'low'/'medium'/'high' or { budgetTokens } or { raw }. See Structured Output → Reasoning effort. |
responseFormat | ResponseFormat? | JSON-schema constraint. See Structured Output. |
maxTokens | number? | Maximum tokens for the completion. |
temperature | number? | Sampling temperature (0–2). Silently dropped on OpenAI reasoning models. |
Result
{
text: string; // model output (parse yourself if responseFormat used)
toolCalls: Array<{ // function-tool calls; MCP tool calls are resolved provider-side
id: string;
name: string;
args: Record<string, unknown>;
}>;
}The method throws if the underlying call errors out. Catch at the call site or rely on the fallback chain to route past failures.
When to use response() vs generate() / chat()
| Use case | Method |
|---|---|
| One-shot prompt, want JSON back | runtime.response() |
| Multi-turn conversation, full message history | runtime.generate() |
| Streaming output to the client | runtime.stream() / streamText() |
| Custom tool definitions with handlers, agent loop | runtime.chat() with tools |
If your call only needs prompt + responseFormat + maxTokens,
response() reads cleaner. The other methods give you more knobs for
free.
Provider routing under the hood
response() forwards through the standard adapter pipeline:
- OpenAI — when
mcpServersorreasoningEffortis set, routes through/v1/responses(Chat Completions doesn't accept these fields).responseFormatbecomestext.format. - Anthropic — uses
client.beta.messages.*with themcp-client-2025-11-20beta header when MCP servers are present.reasoningEffortmaps to adaptive thinking on Claude 4.6/4.7 andbudget_tokenson older models. - Google / xAI / OpenRouter — these share the OpenAI adapter via an
OpenAI-compatible endpoint. They throw a clear error when
mcpServersorreasoningEffortis set, so fallback chains skip past them.
The store: false flag is always sent on /v1/responses calls —
response() is one-shot semantics by design, no conversation
persistence on OpenAI's side. If you need persistence, use
runtime.chat() instead.
Self-learning example
The canonical use case: extract FAQs from a chat transcript, consulting a knowledge-base MCP server first to avoid duplicates.
const FAQ_SCHEMA = {
type: 'object',
properties: {
response: {
type: 'array',
items: {
type: 'object',
properties: {
question: { type: 'string' },
answer: { type: 'string' },
intent: { type: 'string' },
},
required: ['question', 'answer', 'intent'],
additionalProperties: false,
},
},
},
required: ['response'],
additionalProperties: false,
} as const;
const result = await runtime.response({
prompt,
systemPrompt: 'Extract FAQs. Consult the KB before creating entries.',
mcpServers: [{
label: 'knowledge_base',
url: 'https://kb.example.com/sse',
headers: { Authorization: `Bearer ${env.KB_TOKEN}` },
allowedTools: ['search_kb'],
requireApproval: 'never',
}],
reasoningEffort: 'high',
responseFormat: {
type: 'json_schema',
json_schema: { name: 'faqs', schema: FAQ_SCHEMA, strict: true },
},
});
const parsed = JSON.parse(result.text).response;A working version lives in examples/fallback-demo — see the
POST /response route.
Next Steps
- Structured Output — full reference for
responseFormat,mcpServers,reasoningEffort - generateText() — multi-turn / tool-loop API
- Tools — function-calling primitives