the Host Adaptation Plan

中文版

Status: planning only, non-normative. Nothing here changes spec/protocol.md or the safety boundary. This document records the strategy for closing the quality gap between AFUI-rendered surfaces and hand-built frontends — across providers as different as mail triage, a CMS admin panel, a server monitor, and a game — without giving up the data-only model or multi-provider composition.

The problem

One trusted host must render arbitrary providers, and today it renders all of them the same way: every unit becomes a tile in one grid, every record list becomes the same inline master/detail, every capability result becomes a status line. The result is safe and generic — and generically mediocre. A mail surface wants focus and reading flow; an admin panel wants dense forms and bulk edits; a monitor wants glanceable live boards; a game wants a table and turns. A single uniform arrangement cannot feel right for all of them, no matter how much its CSS is polished.

The gap is not the protocol’s. Providers already publish the semantics a good surface needs — salience, layout role / relation, summary, actor-state selection — and the reference host consumes almost none of it. The question is where the remaining design judgment should live.

Where can design judgment live?

A hand-built frontend feels good because a designer made hundreds of small decisions: what is one glance, what deserves a page, what sits beside what, what happens on Enter. Every one of those decisions has to live somewhere. There are exactly four candidate homes:

  1. The provider, through richer protocol hints. Fast and well-informed — the provider knows its domain. But concrete presentation in the wire format breaks the three properties the protocol exists for: two providers’ pixel plans cannot compose into one surface, a web-shaped hint is meaningless to a terminal or voice viewer, and presentation authority crossing the trust boundary is exactly the smuggling the spec forbids. Detailed semantic hints are fine; detailed visual hints are poison.
  2. The host, as code. Safe and composable, but a single generic renderer plateaus at “acceptable dashboard”. Host code scales across providers only if it encodes patterns, not per-app layouts.
  3. A third artifact — viewer-owned, declarative, per (provider, host). A file of concrete presentation decisions that the host applies under the user’s policy. It can be authored by anyone — the provider’s team, an agent, the user — because applying it is always the viewer’s choice. The protocol already sketches the mechanism (user overrides live in the user’s own section; the host has an init-assets escape hatch); it has never been made a first-class artifact.
  4. An agent at runtime. Maximally adaptive, but the protocol’s own force #3 stands: a good default must not require AI. Runtime AI layout is expensive, unreproducible, and unavailable to most users today.

The plan: patterns in the host (2), concrete decisions in a declarative artifact (3), agents as one author of that artifact (4), and only small, strictly semantic additions to the hints (1). No single home wins; each takes the decisions it is structurally suited to hold.

The platform analogy

The proof that this shape works is every native platform. iOS never shipped a layout engine that reads an app’s pixel plan; it shipped a small set of excellent interaction patterns — navigation stacks, split views, lists, sheets — plus human interface guidelines, and thousands of apps assembled from them feel native and good. The apps declare intent; the platform supplies the interaction pattern; the user gets consistency for free.

The AFUI host should be a platform in exactly this sense, not a website generator. Its value grows by nailing a handful of interaction archetypes with real polish, and letting every provider that matches an archetype inherit that polish — including archetypes’ keyboard models, focus behavior, empty states, and responsive collapse, which no per-app hint could ever carry across media.

Four adaptation tiers

Every provider lands on the lowest tier that serves it; higher tiers refine, never replace, the lower ones. A space always renders on Tier 0 — that is the safety floor and nothing above it may break it.

TierWhat it isWho authors itCovers
0Canonical kinds + default shellhost codeeverything, safely
1Genre shells — archetype-specific arrangementshost code~90% of operational surfaces
2Arrangement packs — declarative per-provider tuningprovider team, agent, or userthe last mile of polish
3Trusted renderer modules — user-installed viewer codewhoever the user truststhe long tail (real-time canvases)

Tier 0 — the floor (exists today, needs the density fix)

The current reference host, upgraded to actually consume the semantics it receives: salience maps to density (a subtle facts unit is a badge strip, not a half-screen tile), master_detail maps to a focus stack with a full-height reader, overlay maps to summoned drawers, request results get a generic report surface. These fixes are prerequisites for everything above and are worth doing regardless of the rest of this plan.

A useful discipline for the floor: every unit must be renderable at three zoom levelsglance (a badge or one line, from summary), compact (a row or small card), and full (the unit’s complete surface). Shells and packs then only ever pick zoom levels and positions; they never invent renderings.

Tier 1 — genre shells

A genre is an interaction archetype the host implements as a complete, opinionated shell: slot structure, navigation model, keyboard grammar, live behavior, empty states. The initial vocabulary, chosen to cover the surfaces we actually see:

GenreInteraction patternTypical provider
triagequeue → focused reader → verb-per-item, keyboard-firstmail, tickets, review queues
collection_admintree/filter nav → table → record editor, bulk actionsCMS, inventory, store back-office
monitorglance board of live tiles → drill into detail + historyserver status, pipelines, fleet health
readerdocument-first, table of contents, wide-measure bodydocs, reports, long notes
conversationtimeline + composer, newest-at-bottomchat, comment threads, agent sessions
boardspatial grid of facts + turn-gated move capabilitiesgo, cards, tile-based management games
consoleterminal-first with supporting status around itops boxes, REPLs, agent terminals

A provider declares its genre as one advisory fact in its section, e.g. "surface": { "genre": "triage" }. The declaration is a hint like any other: a host that lacks the shell ignores it and Tier 0 applies; a wrong genre makes the surface less convenient, never unsafe. When no genre is declared the host may infer one from what is already published (role / relation / kinds — a section that is mostly log + facts + attention summaries reads as monitor), so existing providers improve without changes.

Genre shells are also where the multi-provider shell lives: each owner is an app with its own workspace rendered by its genre shell; a dock switches apps; one global attention rail aggregates every provider’s summary.attention_binding badges and jumps to the owning unit. Mixing providers is the direction’s core promise, and it is precisely what per-provider visual hints could never do — two hand-made layouts don’t merge, but two genre shells side by side, under one attention rail and one interaction grammar, do.

Tier 2 — arrangement packs

An arrangement pack is a small declarative file of concrete presentation decisions for one provider on one host family — the last-mile judgment that generic shells cannot know:

{
  "afui_pack": "0.1",
  "match": { "name": "afmail" },          // matched by advisory section name/genre
  "genre": "triage",                       // confirm or override inference
  "slots": {
    "queue":   { "unit": "inbox_triage" },
    "reader":  { "unit": "message_reader", "measure": "reading" },
    "glance":  { "units": ["mailbox_status"], "zoom": "glance" },
    "drawer":  { "units": ["archived_cases", "notifications"] }
  },
  "zoom": { "push_queue": "compact" },
  "order": ["inbox_triage", "active_cases"],
  "accent": "host_token:indigo"            // names a host theme token, never a value
}

Rules that keep it inside the trust model:

The pack is the keystone of the roadmap because every authorship era produces the same artifact. Today, a human writes afmail’s pack by hand. Tomorrow, a local agent joins a new space, reads the facts and hints, drafts a pack, screenshots the result headlessly, iterates, and asks the user to keep it. Later still, the user tells a runtime copilot “make the log bigger” and it edits the same pack (or emits the equivalent override facts into the user’s own section — the protocol-native runtime channel that already exists). Nothing built for the hand-authored era is discarded when agents arrive; agent capability only changes who writes the file and how often.

Tier 3 — trusted renderer modules

Some surfaces will never be data-hint-renderable with native quality: a real-time game canvas, a map, a node graph editor. For these the protocol already names the answer — user-installed customization of the trusted viewer. Make it concrete: a renderer module is viewer-side code the user installs (never selected or delivered by observed data), registered for a kind or a section, rendering from the same facts and acting only through the same capability/request path with the same risk gates. A board-genre go game needs no module; a 60 fps action game does, and that is honest — AFUI carries its state and moves as data, and the module is simply part of the viewer the user chose to trust.

What about “very detailed layout hints from the provider”?

The interim idea — let providers publish rich layout detail so a dumb host can render well today — is half right. Right: providers should be able to ship detailed presentation judgment, and short-term quality should not wait for agents. Wrong only about the channel: pushed through the protocol as hints, that detail becomes web-shaped, non-composable, and trust-crossing, and it would have to be unwound later.

The pack is that interim layer, routed around the wire format instead of through it: the provider ships exactly the detailed opinions it wanted to ship — beside its space, as a recommendation the viewer applies under user policy — and the protocol stays clean for the terminal viewer, the voice viewer, and the composed multi-provider surface. Same detail, same authorship, same short-term payoff; none of the long-term damage, and nothing to migrate away from when agents mature.

The only protocol-adjacent additions this plan needs are small and strictly semantic, staying inside the View Hint Profile’s character: the advisory surface.genre declaration, and (if practice demands it) a per-unit default zoom hint. No panels, no pixels, no columns.

Agents: two roles, neither load-bearing

In both roles the agent produces the same declarative artifacts a human would. The host never requires AI to render, so the protocol’s “a good default must not require AI” survives intact.

What this plan does not chase

Roadmap

  1. Phase 0 — the floor. Salience→density mapping, three zoom levels, focus-stack reader, summoned overlays, a generic request-result surface, incremental emit-driven re-render. (Independently justified; fixes the worst of today’s afmail experience.)
  2. Phase 1 — genres. surface.genre in the View Hint Profile as an advisory field; triage and monitor shells first (afmail and a server monitor make honest, opposite test subjects); genre inference for undeclared spaces; the owner-as-app workspace shell with dock and global attention rail.
  3. Phase 2 — packs. Pack format 0.1, layering and consent rules, a hand-authored afmail reference pack, pack shipping convention for spores, collection_admin shell driven by a real CMS-shaped provider.
  4. Phase 3 — agent authorship. A pack-drafting agent workflow (space → draft → headless screenshot → iterate → user approval); runtime copilot channel via user-section override facts.
  5. Phase 4 — the long tail. Renderer-module contract (registration, isolation, capability-path-only effects), board shell, conversation shell.

Each phase is verified the same way: pick a benchmark task flow per genre (triage ten messages and queue a reply; find and edit a CMS record; spot and drill into a failing host; play ten turns), run it headlessly, and compare step count and readability against a hand-built baseline. “Users find it good” is task parity plus the two things hand-built apps cannot offer: provider mixing under one attention rail, and human-agent copresence on the same live surface.