Turning a Screenshot Into a Prompt Your Agent Can Use

June 04, 2026 · 5 min read

An agent cannot see your screen. It reads text. So the job of turning a screenshot into a usable prompt is really the job of translating what you see into specific sentences, tied to specific pixels, in a format the agent will accept as input. Here is the procedure that works, start to finish, in about three minutes.

Step 1: Capture the exact frame, not the whole screen

Open the page or app you want to comment on. Get it into the state you want the agent to fix: the error visible, the modal open, the dropdown expanded. If the bug only appears after three clicks, do those three clicks first, then capture.

Open a new review in another tab and click Capture screen. The browser asks which window or screen to share. Pick the tab. Cobalt Capture draws the current frame to a canvas. Drag a rectangle around the part that matters and drop everything else. A tight crop is worth more than a full-page screenshot because the agent will describe what it sees from your comment, not from pixels, and a smaller area means less ambiguity about what "the button" refers to.

If you need to compare two states, capture two items. One for the broken state, one for the expected state. Do not try to squeeze both into one screenshot.

Step 2: Drop a numbered pin on the thing you are talking about

Add a pin to the screenshot at the exact spot you are referencing. The pin gets a number. Now your comment can say "at pin 1, the submit button is disabled" instead of "the button on the right, you know, the one near the top." The agent reads the pin number in the markdown export and matches it to a coordinate description. This is the single biggest accuracy upgrade you can make to a visual bug report.

If there are three problems in one screenshot, drop three pins and write three separate sentences. Do not combine them.

Step 3: Write the comment as instructions, not observations

This is where most screenshots fail as agent input. "The header looks weird" is an observation. An agent cannot act on it. Rewrite it as an instruction with a target state:

At pin 1, the header logo is 64px tall and overlaps the nav links. Reduce it to 32px and add 16px of right margin so the nav links sit clear of it.

Three things to include in every comment: what you see (with a pin reference), what should happen instead, and any constraint the agent needs to respect. If you do not know the exact pixel value, say what good looks like: "small enough that the nav fits on one line at 1280px wide."

Dictation helps here. Click the mic on the comment field and talk through the fix the way you would explain it to a teammate. Chrome and Edge use the browser's built-in speech recognition. Firefox does not, so type instead. Spoken comments tend to be more specific than typed ones because you naturally include context you would skip when typing.

Step 4: Add the free-floating context the screenshots do not show

Some information is not on screen. Browser, viewport size, logged-in user role, the feature flag that was on, the URL pattern. Add these as free-floating comments with no screenshot attached. Put them at the top of the review so the agent reads them before the per-item instructions.

A good template:

Environment: staging, Chrome 131, viewport 1440x900, logged in as an admin user.
Route: /dashboard/projects/:id/settings
Repro: open settings, click Save without changing anything.

If you want the deeper version of why this matters, the breakdown in what an agent-readable bug report actually contains lays out the fields agents reliably use.

Step 5: Publish and grab the markdown URL

Click Publish. The review gets a short public URL like /r/abc123. Append /markdown to get the plain-text version. That markdown file is your prompt. It includes the pin numbers, the comments in order, the environment block, and references to each screenshot. The procedure for going from capture to clean prompt is the same workflow described on the screenshot to prompt page, and the underlying mechanics are covered in markdown screenshots.

You do not need an account to publish. An anonymous review is kept for 30 days. If you sign in later with Google or a magic link, the review is claimed permanently.

Step 6: Hand the markdown to your agent

How you deliver the markdown depends on the agent. In Cursor, paste the markdown into the chat or attach the URL. In Claude Code, fetch the /markdown URL directly. In Cline or Windsurf, same pattern: paste or fetch. The agent reads the structured comments, sees the pin references, and starts making changes against the repo it has open.

Lead with one sentence of intent before the pasted markdown. Something like: "Fix the issues described below. Treat each pinned comment as a separate task. Ask before changing files outside src/components/header." That keeps the agent scoped.

What to do when the agent gets it wrong

If the agent misreads pin 2 or fixes the wrong element, do not re-explain in chat. Capture the new state, drop a new pin, add a comment that says what it got wrong and what to do instead, and republish. The whole point is that the loop stays in the same format: capture, pin, instruct, export. The tighter you keep that feedback loop, the less drift you get between what you meant and what the agent built.

Three minutes of careful capture beats thirty minutes of back-and-forth in chat. Start with one pinned screenshot and one specific instruction, and build from there.