Structured feedback for LLMs

Large language models work better on feedback that is structured, headings, bullet points, code blocks, embedded media, because structure resolves ambiguity the prose alone leaves open.

Structured feedback for LLMs is product feedback shaped so a language model can parse, attribute, and act on it without guessing. This page is part of the guides hub, which collects the concepts and workflows behind agent-readable feedback. The short version: headings carve the document, bullets parallelize the items inside it, code blocks pin down literal content, and image references give the model something to look at. Prose alone leaves too much for the model to infer.

What you'll learn

  • Why structured input outperforms a wall of prose for LLM tasks
  • The five structural primitives that earn their tokens
  • A concrete shape for product feedback that agents can act on
  • How the CobaltCapture capture flow produces that shape automatically
  • When structure stops mattering (spoiler: it does not)

Why LLMs benefit from structure

Every LLM has a context window, and every token in that window costs something. Some tokens carry signal; many carry filler. The job of structure is to push the ratio toward signal. Headings, bullets, and code blocks are not cosmetic. They are cheap, high-information tokens that tell the model how to read the rest.

The first benefit is context-window utilization. A 400-word prose blob and a 400-word structured document hold the same raw content, but the structured version is faster to navigate. When the model attends across the window, it uses positional and structural cues to find what matters. A heading like ## Finding 3: email validation is a beacon. A new paragraph that starts with "Also, another thing" is not. The model spends fewer attention cycles guessing what the current topic is, which leaves more capacity for actually reasoning about it.

The second benefit is tokens-per-thought efficiency. Bulleted lists let you express five parallel items in five short lines instead of five run-on sentences glued together with "and" and "also." That compression is real: the same idea fits in fewer tokens, which means more room for the next idea. Code blocks compress even harder, because they remove the need for the model to disambiguate what is meant literally versus what is meant descriptively.

The third benefit is reduced ambiguity. Prose forces the model to infer scope. "The button is broken and the form crashes", is "the form" the same form the button sits in, or a different one? A structured document answers that with headings or source URLs per finding. Less guessing means fewer hallucinated repairs.

The fourth benefit is attribution. Reasoning models, in particular, cite sources better when the sources are tagged inline. A Source: https://staging.example.com/checkout line above a screenshot tells the model exactly which page it is looking at. That metadata travels with the finding into the patch, the commit message, and the test plan. Structure is what makes citation possible at all.

The structural primitives that matter

Five structural elements do almost all the work. Master these and you have a format LLMs handle well.

Headings. A heading orients the model. The convention ## Finding 1: <thing> followed by ## Finding 2: <thing> makes the structure of the document legible at a glance. The model can plan its response around those boundaries, one fix per finding, in order. Without headings, the model often collapses related issues into one response or splits unrelated ones apart.

Bulleted lists. Bullets are for parallel items. If five things share a structure, five bugs, five steps, five components, bullet them. Each bullet should be self-contained and roughly the same length. Mixed-length bullets signal that the items are not actually parallel, which confuses the model. Keep bullets short; if a bullet needs three sentences, promote it to its own heading.

Code blocks. Fenced code blocks preserve formatting and tell the model the contents should be treated literally. Use them for selectors, file paths, function names, error messages, JSON payloads, and exact strings to find or replace. Inline backticks work for single tokens; triple-backtick blocks work for multi-line content. The fence is doing real work: it tells the model not to "improve" the spelling or grammar of what is inside.

Image references. Inline image syntax (![alt](url)) embeds visual evidence in a way modern multimodal models can actually consume. The URL must be public and stable. The alt text matters too, it gives non-multimodal contexts (terminal-based agents, search snippets, screen readers) a textual fallback. A screenshot with no alt text is half-invisible.

Source URLs. A bare URL on its own line, or a Source: <url> label, ties a finding to the page or commit it came from. This is the single most underused structural element. It costs almost nothing in tokens and lets the model verify, cite, and navigate. Agents like Cursor and Claude Code will follow these URLs when they need more context.

A practical structure for product feedback

Here is the shape that works for the most common case: a reviewer walking through a staging build and recording what is broken.

# Staging review, checkout flow

## Finding 1: submit button overflows on mobile

Source: https://staging.example.com/checkout

![Submit button overflowing its container on a 375px viewport](https://cobaltcapture.com/r/abc123/img/1.png)

The submit button overflows its container at viewport widths under
380px. Tested on iOS Safari 17 and Chrome Android. The container has
a fixed pixel padding instead of percentage padding, which is the
likely root cause.

## Finding 2: email validation fires too early

Source: https://staging.example.com/checkout

![Validation error showing on the email field mid-typing](https://cobaltcapture.com/r/abc123/img/2.png)

Email validation fires on every keystroke instead of waiting for blur
or a debounced pause. Users see an "invalid email" message before
they finish typing, which is jarring. Suggested fix: 300ms debounce
or run validation on blur.

Compare that to the prose alternative: "Hey, I looked at staging and the submit button is broken on mobile, it overflows the container, looks like a padding issue. Also the email field validates while you type, which is annoying. Both on the checkout page. Can you fix?"

Both versions contain the same facts. Only the structured version tells the agent which screenshot belongs to which finding, where each one lives, and what to address first. The prose version forces the model to reconstruct that mapping every time, which is where errors creep in, the patch lands on the wrong component, or the second finding silently gets dropped.

How CobaltCapture fits in

The structure above is exactly what CobaltCapture produces. The capture flow is opinionated about format on purpose: each item gets its own heading, its own screenshot, its own source URL, and a dictated commentary paragraph. You do not have to think about the shape, the shape is the product.

You capture a screen, crop the part that matters, talk through the problem out loud, and publish. The output is a public review URL and a one-click markdown export. Paste either into an agent prompt and the agent gets a structured document, not a wall of prose. For the canonical workflow with a specific agent, see feedback for Cursor, same structure, tailored to how Cursor's composer reads markdown.

The point is not that markdown is magic. The point is that LLMs reward documents whose structure matches their expectations, and CobaltCapture is built to produce exactly those documents without making you reformat anything.

Frequently asked questions

Why does structure matter for LLMs?

Structure tells the model where one idea ends and the next begins. Headings group related claims, bullets parallelize comparable items, and code blocks fence off content that should be treated literally. Without that signal, the model has to infer boundaries from punctuation alone, which is noisier and burns context.

What structural elements help most?

Headings, bulleted lists, fenced code blocks, image references with descriptive alt text, and explicit source URLs. Together they let the model orient quickly: what is the topic, what are the discrete findings, what is the literal content, what is the visual evidence, and where did it come from.

Is markdown better than plain text for LLM input?

Yes for any feedback longer than a sentence or two. Markdown is the lingua franca that modern coding agents and chat models train on heavily, so they parse its conventions reliably. Plain text works for short prompts but loses fidelity as findings accumulate.

How long should structured feedback be?

Long enough to cover each finding once, short enough that the model can hold it all in working context. For a typical product review, three to ten findings of two to four sentences each, plus a screenshot per finding, lands in the right zone. Trim ruthlessly before adding more.

Does structure matter as much for newer reasoning models?

Yes. Reasoning models still pay token costs for ambiguity and still cite sources better when sources are explicit. Structure does not become optional just because the model can think longer; it becomes a cheaper way to get the same answer.

Frequently asked questions

Why does structure matter for LLMs?

Structure tells the model where one idea ends and the next begins. Headings group related claims, bullets parallelize comparable items, and code blocks fence off content that should be treated literally. Without that signal, the model has to infer boundaries from punctuation alone, which is noisier and burns context.

What structural elements help most?

Headings, bulleted lists, fenced code blocks, image references with descriptive alt text, and explicit source URLs. Together they let the model orient quickly: what is the topic, what are the discrete findings, what is the literal content, what is the visual evidence, and where did it come from.

Is markdown better than plain text for LLM input?

Yes for any feedback longer than a sentence or two. Markdown is the lingua franca that modern coding agents and chat models train on heavily, so they parse its conventions reliably. Plain text works for short prompts but loses fidelity as findings accumulate.

How long should structured feedback be?

Long enough to cover each finding once, short enough that the model can hold it all in working context. For a typical product review, three to ten findings of two to four sentences each, plus a screenshot per finding, lands in the right zone. Trim ruthlessly before adding more.

Does structure matter as much for newer reasoning models?

Yes. Reasoning models still pay token costs for ambiguity and still cite sources better when sources are explicit. Structure does not become optional just because the model can think longer; it becomes a cheaper way to get the same answer.

Capture your first review.

About a minute from open tab to a shareable URL your agent can ingest.

Start capturing