the Host Adaptation Plan

中文版

Status: planning only, non-normative. Nothing here changes spec/protocol.md or the safety boundary. This document records the strategy for closing the quality gap between AFUI-rendered surfaces and hand-built frontends — across providers as different as mail triage, a CMS admin panel, a server monitor, and a game — without giving up the data-only model or multi-provider composition.

The problem

One trusted host must render arbitrary providers, and today it renders all of them the same way: every unit becomes a tile in one grid, every record list becomes the same inline master/detail, every capability result becomes a status line. The result is safe and generic — and generically mediocre. A mail surface wants focus and reading flow; an admin panel wants dense forms and bulk edits; a monitor wants glanceable live boards; a game wants a table and turns. A single uniform arrangement cannot feel right for all of them, no matter how much its CSS is polished.

The gap is not the protocol’s. Providers already publish the semantics a good surface needs — salience, layout role / relation, summary, actor-state selection — and the reference host consumes almost none of it. The question is where the remaining design judgment should live.

Where can design judgment live?

A hand-built frontend feels good because a designer made hundreds of small decisions: what is one glance, what deserves a page, what sits beside what, what happens on Enter. Every one of those decisions has to live somewhere. There are exactly four candidate homes:

The provider, through richer protocol hints. Fast and well-informed — the provider knows its domain. But concrete presentation in the wire format breaks the three properties the protocol exists for: two providers’ pixel plans cannot compose into one surface, a web-shaped hint is meaningless to a terminal or voice viewer, and presentation authority crossing the trust boundary is exactly the smuggling the spec forbids. Detailed semantic hints are fine; detailed visual hints are poison.
The host, as code. Safe and composable, but a single generic renderer plateaus at “acceptable dashboard”. Host code scales across providers only if it encodes patterns, not per-app layouts.
A third artifact — viewer-owned, declarative, per (provider, host). A file of concrete presentation decisions that the host applies under the user’s policy. It can be authored by anyone — the provider’s team, an agent, the user — because applying it is always the viewer’s choice. The protocol already sketches the mechanism (user overrides live in the user’s own section; the host has an init-assets escape hatch); it has never been made a first-class artifact.
An agent at runtime. Maximally adaptive, but the protocol’s own force #3 stands: a good default must not require AI. Runtime AI layout is expensive, unreproducible, and unavailable to most users today.

The plan: patterns in the host (2), concrete decisions in a declarative artifact (3), agents as one author of that artifact (4), and only small, strictly semantic additions to the hints (1). No single home wins; each takes the decisions it is structurally suited to hold.

The platform analogy

The proof that this shape works is every native platform. iOS never shipped a layout engine that reads an app’s pixel plan; it shipped a small set of excellent interaction patterns — navigation stacks, split views, lists, sheets — plus human interface guidelines, and thousands of apps assembled from them feel native and good. The apps declare intent; the platform supplies the interaction pattern; the user gets consistency for free.

The AFUI host should be a platform in exactly this sense, not a website generator. Its value grows by nailing a handful of interaction archetypes with real polish, and letting every provider that matches an archetype inherit that polish — including archetypes’ keyboard models, focus behavior, empty states, and responsive collapse, which no per-app hint could ever carry across media.

Four adaptation tiers

Every provider lands on the lowest tier that serves it; higher tiers refine, never replace, the lower ones. A space always renders on Tier 0 — that is the safety floor and nothing above it may break it.

Tier	What it is	Who authors it	Covers
0	Canonical kinds + default shell	host code	everything, safely
1	Genre shells — archetype-specific arrangements	host code	~90% of operational surfaces
2	Arrangement packs — declarative per-provider tuning	provider team, agent, or user	the last mile of polish
3	Trusted renderer modules — user-installed viewer code	whoever the user trusts	the long tail (real-time canvases)

Tier 0 — the floor (exists today, needs the density fix)

The current reference host, upgraded to actually consume the semantics it receives: salience maps to density (a subtle facts unit is a badge strip, not a half-screen tile), master_detail maps to a focus stack with a full-height reader, overlay maps to summoned drawers, request results get a generic report surface. These fixes are prerequisites for everything above and are worth doing regardless of the rest of this plan.

A useful discipline for the floor: every unit must be renderable at three zoom levels — glance (a badge or one line, from summary), compact (a row or small card), and full (the unit’s complete surface). Shells and packs then only ever pick zoom levels and positions; they never invent renderings.

Tier 1 — genre shells

A genre is an interaction archetype the host implements as a complete, opinionated shell: slot structure, navigation model, keyboard grammar, live behavior, empty states. The initial vocabulary, chosen to cover the surfaces we actually see:

Genre	Interaction pattern	Typical provider
`triage`	queue → focused reader → verb-per-item, keyboard-first	mail, tickets, review queues
`collection_admin`	tree/filter nav → table → record editor, bulk actions	CMS, inventory, store back-office
`monitor`	glance board of live tiles → drill into detail + history	server status, pipelines, fleet health
`reader`	document-first, table of contents, wide-measure body	docs, reports, long notes
`conversation`	timeline + composer, newest-at-bottom	chat, comment threads, agent sessions
`board`	spatial grid of facts + turn-gated move capabilities	go, cards, tile-based management games
`console`	terminal-first with supporting status around it	ops boxes, REPLs, agent terminals

A provider declares its genre as one advisory fact in its section, e.g. "surface": { "genre": "triage" }. The declaration is a hint like any other: a host that lacks the shell ignores it and Tier 0 applies; a wrong genre makes the surface less convenient, never unsafe. When no genre is declared the host may infer one from what is already published (role / relation / kinds — a section that is mostly log + facts + attention summaries reads as monitor), so existing providers improve without changes.

Genre shells are also where the multi-provider shell lives: each owner is an app with its own workspace rendered by its genre shell; a dock switches apps; one global attention rail aggregates every provider’s summary.attention_binding badges and jumps to the owning unit. Mixing providers is the direction’s core promise, and it is precisely what per-provider visual hints could never do — two hand-made layouts don’t merge, but two genre shells side by side, under one attention rail and one interaction grammar, do.

Tier 2 — arrangement packs

An arrangement pack is a small declarative file of concrete presentation decisions for one provider on one host family — the last-mile judgment that generic shells cannot know:

{
  "afui_pack": "0.1",
  "match": { "name": "afmail" },          // matched by advisory section name/genre
  "genre": "triage",                       // confirm or override inference
  "slots": {
    "queue":   { "unit": "inbox_triage" },
    "reader":  { "unit": "message_reader", "measure": "reading" },
    "glance":  { "units": ["mailbox_status"], "zoom": "glance" },
    "drawer":  { "units": ["archived_cases", "notifications"] }
  },
  "zoom": { "push_queue": "compact" },
  "order": ["inbox_triage", "active_cases"],
  "accent": "host_token:indigo"            // names a host theme token, never a value
}

Rules that keep it inside the trust model:

Declarative only, presentation only. A pack names units, slots, zoom levels, ordering, and host-defined theme tokens. It carries no code, no URLs, no capability grants, no risk changes; unknown fields are inert. A pack can make a surface uglier, never unsafe.
Applied under user policy. Installing a pack is the user’s act, like choosing a theme — even when the provider ships one in its spore as a recommendation. The host layers: owner hints < genre shell < pack < the user’s own overrides. The user always wins.
Host-family scoped, protocol-invisible. Packs are a host artifact; the wire format never carries them. A terminal host may read the same pack’s media-neutral parts (zoom, order) and ignore the rest.

The pack is the keystone of the roadmap because every authorship era produces the same artifact. Today, a human writes afmail’s pack by hand. Tomorrow, a local agent joins a new space, reads the facts and hints, drafts a pack, screenshots the result headlessly, iterates, and asks the user to keep it. Later still, the user tells a runtime copilot “make the log bigger” and it edits the same pack (or emits the equivalent override facts into the user’s own section — the protocol-native runtime channel that already exists). Nothing built for the hand-authored era is discarded when agents arrive; agent capability only changes who writes the file and how often.

Tier 3 — trusted renderer modules

Some surfaces will never be data-hint-renderable with native quality: a real-time game canvas, a map, a node graph editor. For these the protocol already names the answer — user-installed customization of the trusted viewer. Make it concrete: a renderer module is viewer-side code the user installs (never selected or delivered by observed data), registered for a kind or a section, rendering from the same facts and acting only through the same capability/request path with the same risk gates. A board-genre go game needs no module; a 60 fps action game does, and that is honest — AFUI carries its state and moves as data, and the module is simply part of the viewer the user chose to trust.

What about “very detailed layout hints from the provider”?

The interim idea — let providers publish rich layout detail so a dumb host can render well today — is half right. Right: providers should be able to ship detailed presentation judgment, and short-term quality should not wait for agents. Wrong only about the channel: pushed through the protocol as hints, that detail becomes web-shaped, non-composable, and trust-crossing, and it would have to be unwound later.

The pack is that interim layer, routed around the wire format instead of through it: the provider ships exactly the detailed opinions it wanted to ship — beside its space, as a recommendation the viewer applies under user policy — and the protocol stays clean for the terminal viewer, the voice viewer, and the composed multi-provider surface. Same detail, same authorship, same short-term payoff; none of the long-term damage, and nothing to migrate away from when agents mature.

The only protocol-adjacent additions this plan needs are small and strictly semantic, staying inside the View Hint Profile’s character: the advisory surface.genre declaration, and (if practice demands it) a per-unit default zoom hint. No panels, no pixels, no columns.

Agents: two roles, neither load-bearing

Design time (near term): the agent is a pack author. It reads a space, drafts a pack, verifies with headless screenshots, iterates, and the user keeps the result. This is batch work — it needs one capable agent run per provider, not an agent on every user’s machine, and its output serves every user of that host afterward. Agent scarcity is therefore not a blocker: the ecosystem can hand-author packs for flagship providers now and let agents take over authorship as they spread.
Runtime (later): the agent is a copilot, adjusting the surface by editing the user’s pack or emitting presentation-override facts into the user’s own section — both channels the host already honors from humans. It proposes; the deterministic host renders.

In both roles the agent produces the same declarative artifacts a human would. The host never requires AI to render, so the protocol’s “a good default must not require AI” survives intact.

What this plan does not chase

Pixel-parity with bespoke brand frontends and marketing sites. The target is native-platform quality for operational surfaces — the standard of a good internal tool or a well-made native app, reached through shared interaction patterns.
Runtime layout synthesis by AI, or any rendering path that varies per run.
Provider-delivered code or styling authority, in any tier, ever.

Roadmap

Phase 0 — the floor. Salience→density mapping, three zoom levels, focus-stack reader, summoned overlays, a generic request-result surface, incremental emit-driven re-render. (Independently justified; fixes the worst of today’s afmail experience.)
Phase 1 — genres. surface.genre in the View Hint Profile as an advisory field; triage and monitor shells first (afmail and a server monitor make honest, opposite test subjects); genre inference for undeclared spaces; the owner-as-app workspace shell with dock and global attention rail.
Phase 2 — packs. Pack format 0.1, layering and consent rules, a hand-authored afmail reference pack, pack shipping convention for spores, collection_admin shell driven by a real CMS-shaped provider.
Phase 3 — agent authorship. A pack-drafting agent workflow (space → draft → headless screenshot → iterate → user approval); runtime copilot channel via user-section override facts.
Phase 4 — the long tail. Renderer-module contract (registration, isolation, capability-path-only effects), board shell, conversation shell.

Each phase is verified the same way: pick a benchmark task flow per genre (triage ten messages and queue a reply; find and edit a CMS record; spot and drill into a failing host; play ten turns), run it headlessly, and compare step count and readability against a hand-built baseline. “Users find it good” is task parity plus the two things hand-built apps cannot offer: provider mixing under one attention rail, and human-agent copresence on the same live surface.