Agent-First HTTP v0.5.0: When the Page Needs a Browser

by Agent-First Kit Contributors

v0.5.0 turns afhttp from an HTTP client into a full URL-acquisition tool. A single `afhttp fetch` covers the whole range — a plain HTTP request when that works, a real browser when it doesn't — and returns the page plus structured artifacts (rendered HTML, a DOM observation, a screenshot, network and console logs) an agent can branch on. It adds a browser-host / agent-driver split, a raw CDP escape hatch, deep network capture, an ops panel with optional KasmVNC display takeover for human login/captcha/2FA, and persistent profiles. The public contract converged in the process: flat `*_file` artifact paths, one profile per host, and no legacy aliases.

Until now, afhttp was an HTTP client built from the agent’s side: structured responses, previewable requests, typed transport failures. That is the right tool right up until the URL doesn’t turn into a usable page from a shell request — because it needs JavaScript, a cookie, a session, or a browser fingerprint the site recognizes. A human hits that wall and opens a browser. An agent needs the same escalation, as data.

Agent-First HTTP v0.5.0 is that escalation. The hard part for an agent was never that fetching bytes is slow; it’s that many useful pages don’t exist until a browser builds them. So afhttp now covers the whole acquisition range behind one structured contract.

One command, the whole range

afhttp fetch decides how hard to try. --render picks the strategy:

afhttp fetch https://example.com --render none      # HTTP fast path, no browser
afhttp fetch https://app.example.com --render auto  # HTTP first, escalate on failure
afhttp fetch https://app.example.com --render always # straight to the browser

auto is the point of the release: it runs the plain HTTP path, and when that comes back unusable — a connection failure, a 5xx, an empty shell that needed JavaScript — it escalates to a real browser instead of handing the agent a dead end. With no --endpoint-url, that escalation spins up a sandboxed inline browser in-process and tears it down after the fetch — zero setup for a one-shot. (For sessions and isolation you point at a long-lived host instead; see below.) The result envelope says which path ran and why, so the decision is never hidden:

{
  "code": "fetch",
  "status": 200,
  "final_url": "https://app.example.com/",
  "body_file": "/work/afhttp-out/req/body.html",
  "rendered_html_file": "/work/afhttp-out/req/rendered.html",
  "network_file": "/work/afhttp-out/req/network.json",
  "trace": {"render_decision": "browser", "render_used": true, "duration_ms": 820}
}

One browser isn’t enough: meet each site with the engine it demands

A real browser isn’t one thing. How hard a site fights back decides which engine actually reaches it, so v0.5.0 drives a whole spectrum behind the same CDP contract — chosen with --browser:

The same fetch contract and the same artifacts come back whichever engine ran, so escalating from a plain GET to a fingerprint-stealth browser is a flag change, not a rewrite.

Artifacts an agent can branch on

A browser-backed fetch doesn’t just return HTML. It captures what a human would look at if they were debugging the page by hand, each as a file referenced from the envelope: the raw body, the rendered_html after scripts run, a plain text projection, a screenshot, the network timeline, the console log, and an observation — an agent-readable snapshot of the interactive elements on the page. (storage is available opt-in.) Pick a subset with --want, or take the default set. The agent never has to scrape a screenshot for text or guess why a page looked empty; the evidence is structured.

Two roles: host where the browser must be, driver where the agent runs

v0.5.0 splits afhttp into a long-lived browser-host and short-lived agent-driver clients:

# same machine: a Unix socket, no network exposure at all
afhttp host  --listen unix:/run/afhttp.sock --profile work
afhttp fetch https://app.example.com --endpoint unix:/run/afhttp.sock

# cross-host: a token, reached over your private network as wss:// via the mesh
afhttp host  --listen tcp:0.0.0.0:9222 --token "$AFHTTP_TOKEN" --profile work
afhttp fetch https://app.example.com --endpoint wss://host.internal:9222 --token "$AFHTTP_TOKEN"

The host holds one Chromium-compatible browser bound to one on-disk profile and exposes a CDP endpoint plus the ops panel. Drivers connect, do work, and write artifacts locally. Because the two are independently locatable, you run the host where the browser needs to be — a residential IP, a GUI machine, a datacenter — and the driver wherever the agent runs.

Run the host in a container — that’s where the browser and all the backend complexity live. Chromium’s OS sandbox is on by default; the host image disables it (AFHTTP_NO_SANDBOX) so that the container itself is the isolation boundary for the untrusted content it loads — while a host or inline fetch run natively keeps the sandbox enabled. v0.5.0 ships a host image (container/docker/): chromium by default, other backends opt-in via build args, and a bearer token generated by default. The driver stays a thin client and runs wherever the agent is — now including native Windows, not just Linux/macOS.

That endpoint is full control of the browser and its profile — cookies, live sessions, downloads — so treat it that way. afhttp speaks plain CDP over WebSocket and does not terminate TLS itself: on one machine, prefer a unix: socket or tcp:127.0.0.1 and skip the network entirely; across hosts, set a --token (sent as Authorization: Bearer) and reach it as wss:// over your private network or mesh. Never put a tokenless endpoint on a public interface — without --token the listener accepts every caller. Connectivity and TLS across hosts are your mesh’s problem, not afhttp’s.

Deep network capture and a raw CDP escape hatch

When the useful data arrives over XHR/fetch/GraphQL instead of the initial document, --network-bodies xhr|all captures response bodies (with a per-body cap), and --capture-ws / --capture-sse record WebSocket and SSE frames. Sensitive values in network.json are redacted by default; --network-redact off is available for trusted local debugging.

When fetch isn’t enough, afhttp cdp sends one raw Chrome DevTools Protocol method to a target tab — DOM inspection, form submission, custom waits — with no “click/type” abstraction layer in the way. afhttp upload injects a local file into an <input type=file> through the privileged DOM.setFileInputFiles primitive. The agent gets full browser control without afhttp pretending to understand the page.

When a human has to step in

Some sites need a person: a manual login, a captcha, 2FA. The ops panel lets a human take over the same browser the agent is using, then hand it back — state intact. The default panel needs no VNC or X server. For hard sites, v0.5.0 adds an optional KasmVNC display-takeover mode:

afhttp host --listen tcp:0.0.0.0:9222 --profile work --takeover kasmvnc
afhttp ui --endpoint ws://host:9222   # prints the panel + display-takeover URLs

--display-quality 0-100 trades clarity for bandwidth and is adjustable live in the panel. The agent emits an out-of-band “I’m stuck on this endpoint/tab” signal; a human opens the panel, does their part, closes it; the agent’s next fetch or cdp continues from the new browser state.

Persistent profiles, cookies, and captured downloads

A host binds exactly one profile, persisted under $XDG_DATA_HOME/afhttp/profiles, so sessions survive across fetches. The cookie jar is profile-internal — never the system browser’s, never shared across hosts or profiles. Local admin commands inspect and maintain profiles without touching a browser:

afhttp profile list
afhttp profile cookies work        # non-expired cookies, values redacted
afhttp profile downloads work      # files the browser captured, read-only
afhttp profile prune --older-than 30d

Breaking: the contract converged

v0.5.0 is pre-1.0 and took the chance to make the public surface honest. These are breaking changes with no compatibility shims:

The browser host also scrubs ambient environment (HTTP_PROXY, XDG_*, BROWSER, locale) before launching, so a browsing session can never silently honor configuration the agent didn’t request — proxies go through --proxy, nothing else.

Help that’s generated, not hand-maintained

Every flag is documented in --help, and the CLI reference is generated from the binary itself via afhttp --help-markdown — so it can’t drift from the code. Every command still prints exactly one line of structured JSON; every failure carries a stable error_code. The tool never decides what a page means or what to do next. The agent does.

Adoption

brew install agentfirstkit/tap/afhttp        # macOS / Linux
scoop bucket add agentfirstkit https://github.com/agentfirstkit/scoop-bucket
scoop install afhttp                          # Windows
cargo install agent-first-http                # any platform