Agent-First HTTP v0.5.0: When the Page Needs a Browser
v0.5.0 turns afhttp from an HTTP client into a full URL-acquisition tool. A single `afhttp fetch` covers the whole range — a plain HTTP request when that works, a real browser when it doesn't — and returns the page plus structured artifacts (rendered HTML, a DOM observation, a screenshot, network and console logs) an agent can branch on. It adds a browser-host / agent-driver split, a raw CDP escape hatch, deep network capture, an ops panel with optional KasmVNC display takeover for human login/captcha/2FA, and persistent profiles. The public contract converged in the process: flat `*_file` artifact paths, one profile per host, and no legacy aliases.
Until now, afhttp was an HTTP client built from the agent’s side: structured responses, previewable requests, typed transport failures. That is the right tool right up until the URL doesn’t turn into a usable page from a shell request — because it needs JavaScript, a cookie, a session, or a browser fingerprint the site recognizes. A human hits that wall and opens a browser. An agent needs the same escalation, as data.
Agent-First HTTP v0.5.0 is that escalation. The hard part for an agent was never that fetching bytes is slow; it’s that many useful pages don’t exist until a browser builds them. So afhttp now covers the whole acquisition range behind one structured contract.
One command, the whole range
afhttp fetch decides how hard to try. --render picks the strategy:
afhttp fetch https://example.com --render none # HTTP fast path, no browser
afhttp fetch https://app.example.com --render auto # HTTP first, escalate on failure
afhttp fetch https://app.example.com --render always # straight to the browser
auto is the point of the release: it runs the plain HTTP path, and when that
comes back unusable — a connection failure, a 5xx, an empty shell that needed
JavaScript — it escalates to a real browser instead of handing the agent a dead
end. With no --endpoint-url, that escalation spins up a sandboxed inline
browser in-process and tears it down after the fetch — zero setup for a
one-shot. (For sessions and isolation you point at a long-lived host instead;
see below.) The result envelope says which path ran and why, so the decision is
never hidden:
{
"code": "fetch",
"status": 200,
"final_url": "https://app.example.com/",
"body_file": "/work/afhttp-out/req/body.html",
"rendered_html_file": "/work/afhttp-out/req/rendered.html",
"network_file": "/work/afhttp-out/req/network.json",
"trace": {"render_decision": "browser", "render_used": true, "duration_ms": 820}
}
One browser isn’t enough: meet each site with the engine it demands
A real browser isn’t one thing. How hard a site fights back decides which engine
actually reaches it, so v0.5.0 drives a whole spectrum behind the same CDP
contract — chosen with --browser:
- chromium / chrome — the default: full rendering, screenshots, downloads.
- chrome-headless-shell — a lean headless build for fast, low-overhead fetches.
- fingerprint-chromium — Chromium that randomizes its fingerprint per profile, for bot-walled sites.
- camoufox — a Firefox stealth fork (driven through foxbridge) for sites that fingerprint Chromium.
- lightpanda — an ultralight engine covering a rendering subset without a full browser.
The same fetch contract and the same artifacts come back whichever engine ran, so escalating from a plain GET to a fingerprint-stealth browser is a flag change, not a rewrite.
Artifacts an agent can branch on
A browser-backed fetch doesn’t just return HTML. It captures what a human would
look at if they were debugging the page by hand, each as a file referenced from
the envelope: the raw body, the rendered_html after scripts run, a plain
text projection, a screenshot, the network timeline, the console log,
and an observation — an agent-readable snapshot of the interactive elements on
the page. (storage is available opt-in.) Pick a subset with --want, or take
the default set. The agent never has to scrape a screenshot for text or guess
why a page looked empty; the evidence is structured.
Two roles: host where the browser must be, driver where the agent runs
v0.5.0 splits afhttp into a long-lived browser-host and short-lived agent-driver clients:
# same machine: a Unix socket, no network exposure at all
afhttp host --listen unix:/run/afhttp.sock --profile work
afhttp fetch https://app.example.com --endpoint unix:/run/afhttp.sock
# cross-host: a token, reached over your private network as wss:// via the mesh
afhttp host --listen tcp:0.0.0.0:9222 --token "$AFHTTP_TOKEN" --profile work
afhttp fetch https://app.example.com --endpoint wss://host.internal:9222 --token "$AFHTTP_TOKEN"
The host holds one Chromium-compatible browser bound to one on-disk profile and exposes a CDP endpoint plus the ops panel. Drivers connect, do work, and write artifacts locally. Because the two are independently locatable, you run the host where the browser needs to be — a residential IP, a GUI machine, a datacenter — and the driver wherever the agent runs.
Run the host in a container — that’s where the browser and all the backend
complexity live. Chromium’s OS sandbox is on by default; the host image
disables it (AFHTTP_NO_SANDBOX) so that the container itself is the isolation
boundary for the untrusted content it loads — while a host or inline fetch run
natively keeps the sandbox enabled. v0.5.0 ships a host image
(container/docker/): chromium by default, other backends opt-in via build args,
and a bearer token generated by default. The driver stays a thin client and
runs wherever the agent is — now including native Windows, not just Linux/macOS.
That endpoint is full control of the browser and its profile — cookies, live
sessions, downloads — so treat it that way. afhttp speaks plain CDP over
WebSocket and does not terminate TLS itself: on one machine, prefer a unix:
socket or tcp:127.0.0.1 and skip the network entirely; across hosts, set a
--token (sent as Authorization: Bearer) and reach it as wss:// over your
private network or mesh. Never put a tokenless endpoint on a public interface —
without --token the listener accepts every caller. Connectivity and TLS across
hosts are your mesh’s problem, not afhttp’s.
Deep network capture and a raw CDP escape hatch
When the useful data arrives over XHR/fetch/GraphQL instead of the initial
document, --network-bodies xhr|all captures response bodies (with a per-body
cap), and --capture-ws / --capture-sse record WebSocket and SSE frames.
Sensitive values in network.json are redacted by default; --network-redact off is available for trusted local debugging.
When fetch isn’t enough, afhttp cdp sends one raw Chrome DevTools Protocol
method to a target tab — DOM inspection, form submission, custom waits — with no
“click/type” abstraction layer in the way. afhttp upload injects a local file
into an <input type=file> through the privileged DOM.setFileInputFiles
primitive. The agent gets full browser control without afhttp pretending to
understand the page.
When a human has to step in
Some sites need a person: a manual login, a captcha, 2FA. The ops panel lets a human take over the same browser the agent is using, then hand it back — state intact. The default panel needs no VNC or X server. For hard sites, v0.5.0 adds an optional KasmVNC display-takeover mode:
afhttp host --listen tcp:0.0.0.0:9222 --profile work --takeover kasmvnc
afhttp ui --endpoint ws://host:9222 # prints the panel + display-takeover URLs
--display-quality 0-100 trades clarity for bandwidth and is adjustable live in
the panel. The agent emits an out-of-band “I’m stuck on this endpoint/tab”
signal; a human opens the panel, does their part, closes it; the agent’s next
fetch or cdp continues from the new browser state.
Persistent profiles, cookies, and captured downloads
A host binds exactly one profile, persisted under
$XDG_DATA_HOME/afhttp/profiles, so sessions survive across fetches. The cookie
jar is profile-internal — never the system browser’s, never shared across hosts
or profiles. Local admin commands inspect and maintain profiles without touching
a browser:
afhttp profile list
afhttp profile cookies work # non-expired cookies, values redacted
afhttp profile downloads work # files the browser captured, read-only
afhttp profile prune --older-than 30d
Breaking: the contract converged
v0.5.0 is pre-1.0 and took the chance to make the public surface honest. These are breaking changes with no compatibility shims:
- Flat artifact paths. Fetch results expose
body_file,rendered_html_file,network_file, … at the top level instead of a nestedartifactsmap. - One profile per host. The
--profile-nameflag and?profile=query parameter are gone; multi-identity means multiple hosts. Downloads are captured into the profile sandbox and listed read-only, not pulled through a standalonedownloadcommand. - No legacy aliases. The old curl-flag-compat shim is removed. afhttp is a URL-acquisition tool, not a curl replacement; a clap-tree test now enforces that no legacy surface creeps back.
The browser host also scrubs ambient environment (HTTP_PROXY, XDG_*,
BROWSER, locale) before launching, so a browsing session can never silently
honor configuration the agent didn’t request — proxies go through --proxy,
nothing else.
Help that’s generated, not hand-maintained
Every flag is documented in --help, and the CLI reference is generated from
the binary itself via afhttp --help-markdown — so it can’t drift from the
code. Every command still prints exactly one line of structured
JSON; every failure carries a stable error_code. The tool never decides what a
page means or what to do next. The agent does.
Adoption
brew install agentfirstkit/tap/afhttp # macOS / Linux
scoop bucket add agentfirstkit https://github.com/agentfirstkit/scoop-bucket
scoop install afhttp # Windows
cargo install agent-first-http # any platform