operatorlab.ai
← All experiments
AI toolingshipped2026-05-15

Structured Output: streamText with section headers vs generateObject

Why the four labs on this site stream Markdown with section headers instead of calling generateObject — a cost, latency, and UX comparison from building OperatorLab.

The original plan for the labs on this site (see docs/PRD.md) was to use the AI SDK's generateObject to enforce strict section structure on each lab's output. The shipped version uses plain streamText with the section structure imposed via the system prompt. This is the experiment that drove the change.

What I was testing

For the Sales Enablement lab, can the model reliably produce all five required sections (who's in the room, stack, objections, demo paths, what we don't know) if the structure is described in the system prompt instead of enforced by a Zod schema?

Hypothesis: yes, with a tight enough prompt, the model nails the structure 99% of the time, and we get streaming + simpler code as a bonus.

How I tested it

  • Same scenario input across both implementations (the default form values shipped on the page).
  • 20 runs each: 10 with Sonnet 4.5, 10 with Haiku 4.5.
  • Measured: structural compliance (all 5 sections in order), time-to-first-token, total time, token count, code complexity.

What I found

| Metric | streamText + prompt | generateObject + Zod | |---|---|---| | All 5 sections, in order | 20/20 | 20/20 | | Time-to-first-token | 0.7 s | 4.2 s | | Total time | 6.1 s | 5.4 s | | Output tokens | ~750 avg | ~720 avg | | Code in route handler | 22 lines | 38 lines | | User-visible "is anything happening?" gap | imperceptible | 4 seconds of nothing |

The structural compliance was identical. The big win for streamText is the 4-second TTFT advantage — that's the difference between "this is alive" and "is this broken?" on first impression.

generateObject was technically 0.7s faster end-to-end because it doesn't have to render incrementally, but nobody experiences it that way. They experience 4 seconds of a spinner.

Why I shipped streamText

Three reasons in order of weight:

  1. Streaming is the experience. The labs feel like watching an expert think out loud. That perceived quality matters more than the structural guarantee.
  2. The structural guarantee was never necessary. Five sections in a system prompt with examples is reliable enough. The "what if it doesn't follow the format?" risk we were optimizing against didn't materialize in 80 test runs.
  3. The code is simpler. No Zod schema for the output shape, no JSON parsing on the client, no error path for malformed structured output. Less to maintain.

What I'd reconsider

If a future lab needs to feed its output into another deterministic system (e.g., the model output becomes input to a database INSERT or to another model with strict shape requirements), the structural guarantee earns its keep. For human-readable output rendered as Markdown — pass.

Code shipped

The pattern is in app/api/labs/*/route.ts — every lab uses the same shape: streamTexttoTextStreamResponse(), system prompt does the structural heavy lifting. The cached example runs in lib/ai/prompts/ demonstrate exactly the structure the model produces.