Screen capture to markdown for AI coding agents
Capture a webpage. Talk through what's wrong. Hand your agent a markdown document with embedded screenshots and commentary — the only format coding agents like Claude Code, Cursor, and Codex can actually read.
Why “screen capture to markdown” is even a category
Most screen-capture tools are built for human viewers. You record video, narrate, share a link, and the person on the other end watches. That model breaks the moment the recipient is an AI coding agent instead of a human.
Agents read text. They process images when those images arrive as URLs in a structured prompt. They can't watch video. They can't follow a voiceover. A 90-second Loom or Scribe walkthrough is wasted on them.
The format that works is straightforward: markdown text with embedded image URLs and one block of commentary per screenshot. That's what every code-aware LLM is trained to ingest. Until now there hasn't been a fast way to produce one.
What “markdown for AI agents” actually means
The export from CobaltCapture is plain markdown. No proprietary format, no JSON wrapper, no SaaS lock-in. A typical capture looks like this:
# Header drift on the marketing page  The page heading is dropping below the hero image on mobile widths. Expected it to stay anchored to the top. Looks like the order is reversed in the flex container.  Same issue on the pricing page. Probably the same component.
Paste that into Claude Code, drop it into Cursor's agent task box, send it to Codex CLI, or include it in an LLM chat. The agent fetches the screenshots, parses the text, and proposes a patch. No transcription, no copy-pasting one image at a time, no losing context to a video format the model can't see.
How CobaltCapture produces it
- Pop out a floating capture window. CobaltCapture sits on top of the product you're reviewing — no app switching, no losing your spot.
- One click per screenshot. The full-resolution frame is captured. Crop it down to the part that matters; skip cropping if the whole frame is the point.
- Dictate or type the commentary. Each screenshot gets its own comment block. Speak the problem out loud and the browser's native dictation turns it into editable text.
- Hit Finish. You get a public review URL and a one-click markdown export. Hand either to your agent.
Where the markdown lands
- Claude Code — paste the markdown into the chat or a task description; the agent fetches the screenshots and proposes a patch.
- Cursor — same workflow, dropped into the agent task box.
- Codex CLI / Codex Web — the markdown is part of the prompt; the model picks up the images from their URLs.
- A Linear or GitHub issue — markdown renders inline, screenshots embed, the engineer or the agent picks it up from the issue.
- An LLM chat — for general-purpose models like ChatGPT or Gemini, just paste and ask for the change.
Common questions
Why not just attach a screenshot to a Slack thread?
Screenshots in a Slack thread are fine for humans. For an agent, you have to copy-paste each image into the prompt one at a time, retype the comment for each, and hope the agent associates the right comment with the right image. Markdown solves all three problems at once.
Why not a Loom or Scribe video?
Agents can't watch video. They can transcribe a voiceover, but the transcript loses everything you pointed at on screen. See the Loom alternative page for the longer version.
Does the markdown work in non-code LLMs too?
Yes. The format is generic markdown. ChatGPT, Gemini, Claude (any surface), and Perplexity all parse it. It's specifically tuned for coding agents because that's where most of the workflow value is today, but nothing about the format is code-specific.
Are the screenshots private?
Every review gets an unguessable URL but the page itself is public-readable to anyone with the link. Don't capture anything you wouldn't want a recipient to see. Private (auth-gated) reviews are on the roadmap.
Try it
Sign in with Google. Free.