Agent-First Data: A Structured-Data Layer Designed from the Agent Side
What a structured-data layer should look like when the primary reader is an agent: type-bearing field names, format-aware output, policy-driven redaction, and protocol events on stdout.
A human reading a config or a log line can guess. timeout: 5000 is probably
milliseconds — or maybe seconds if it’s an old daemon. api_key: sk-1234… is
obviously a secret to redact. size: 5242880 is some number of bytes, give or
take. A person fills the gaps.
An agent cannot guess and stay correct. It sees 5000 and has to choose a
unit; either choice ends an hour later in the wrong place. It sees sk-1234…
and either knows it is a secret or it does not; if it does not, the value
lands in a transcript that someone will paste somewhere. It sees 5242880
and reports “5.2 million units of size.”
Agent-First Data starts from that asymmetry:
What should a structured-data layer look like when the primary reader is an autonomous coding agent?
The premise: data should be self-describing
A schema solves the meaning problem, but it solves it offline. The agent needs to know what a field means at the point of reading it, with no second hop to a schema service, no JSON Schema sidecar, no field-by-field documentation it has to fetch.
So the meaning has to live in the data the agent already has in hand. The cheapest place to put it is the field name.
{
"timeout_ms": 5000,
"created_at_epoch_ms": 1738886400000,
"file_size_bytes": 5242880,
"api_key_secret": "sk-1234567890abcdef"
}
_ms is milliseconds. _epoch_ms is a Unix timestamp. _bytes is a byte
count. _secret is a redaction policy. None of that needs to be looked up.
An agent that reads this object knows the units, the time domain, and what
not to log — from the keys alone.
This is the first afdata design rule: every field name should answer “what is this value” before the agent has to ask anywhere else.
The suffix rule: meaning lives in the name
The convention is a closed set of suffixes that bind to types and policies.
Some name the unit (_ms, _s, _bytes, _count). Some name the domain
(_url, _path, _email, _uuid). Some name a temporal anchor (_at
for ISO 8601, _epoch_ms for milliseconds since 1970). And one — _secret
— names a policy: never render this value in plain text, ever.
The agent does not need the whole vocabulary in advance. It needs to know
that whenever it sees a *_ms field, the integer is milliseconds; whenever
it sees *_secret, the value must be redacted before logging. The suffix
is the contract.
This works in JSON, YAML, TOML, environment variables, database columns, and protobuf field names. The substrate does not matter; the naming is portable.
The rule, stated cleanly: the agent should never have to consult an external schema to learn the type or policy of a field it just received.
The format rule: same data, three audiences
A structured event has three plausible readers. The agent reads JSON because it is protocol. A human on call reads YAML because it scans well on a small screen. A terminal scrollback reads compact logfmt because it interleaves with other lines.
afdata treats the structured object as the source of truth and chooses a formatter at output time:
use agent_first_data::*;
let status = json!({
"uptime_s": 86400,
"memory_bytes": 1048576,
"db_password_secret": "super-secret"
});
println!("{}", output_yaml(&status));
// ---
// db_password: "***"
// memory: "1.0MB"
// uptime: "86400s"
The same status value renders as a single JSONL line in output_json,
a tabular logfmt line in output_plain, or the YAML above. Suffix-bearing
keys are stripped of their suffix in the human formats (because the
formatted value carries the unit), and _secret is honored everywhere.
The rule: the structured shape is canonical. Formatters serve different readers, but none of them are allowed to reinterpret the data or lose its meaning.
The redaction rule: redaction is policy, not formatting
A formatter that decides what to redact will get it wrong. The decision belongs to the data shape, not the renderer.
The default rule is the _secret suffix: any field whose name ends in
_secret is replaced with "***" in every output mode. That covers the
case where the tool author controls the field names.
Real systems also include payloads where the field cannot be renamed. A
third-party API returns {"password": "..."}. A legacy table has a column
called token. The v0.8 line added explicit policy for those:
let policy = RedactionPolicy::default()
.with_secret_names(["password", "token"]);
println!("{}", output_yaml_with(&payload, &policy));
The policy lives outside the formatter call. If a tool defines a redaction
policy at startup, every output call honors it. There is no “format this
with redaction off” path — once a value is marked secret, the only safe
operation is to render it as "***".
For the policy mechanism in detail, see the v0.8 redaction-policy post.
The rule: an agent should never accidentally log a secret because the formatter forgot.
The protocol rule: stdout is a structured channel
afdata is more than naming. It also defines a small protocol template: a
JSONL event with a required code field and an optional trace.
{"code":"ok","result":{"hash":"abc123","size_bytes":456789},"trace":{"duration_ms":1280}}
{"code":"log","event":"startup","config":{"timeout_s":30}}
{"code":"error","error_code":"timeout","message":"upstream did not respond","trace":{"duration_ms":30001}}
code is the discriminator. The agent reads one line, branches on code,
and handles the body without parsing English.
The discipline at the stream level: stdout carries only structured events. Stderr carries free human prose — startup banners, debug spew, panics. They never mix. An agent reading stdout never has to handle a sentence; a human reading stderr never has to handle JSON they did not ask for.
This is the contract that logs were brought into in v0.5.
Logs are not a separate channel — they are events with code: "log", span
fields, and the same suffix rules. A request_id set on a span travels with
every event from that scope.
The rule: stdout is for the agent. Prose is for the human. They do not share a stream.
The error rule: failures carry stable codes
When a tool fails, it produces an event:
{"code":"error","error_code":"timeout","message":"upstream did not respond","retryable":true,"trace":{"duration_ms":30001}}
error_code is the stable handle. timeout, dns_failed, permission_denied,
limit_exceeded. The agent branches on the code. message is the English
version for humans reading logs; the agent does not depend on it. retryable
is a hint the tool gives the agent about whether trying again is plausible.
A tool that builds errors with build_json_error("limit_exceeded", ...)
carries this contract into its own protocol automatically. The error event
is shaped like every other event — same code field, same trace, same
redaction rules.
The rule: failure is data, not prose. An agent never has to grep an error message to decide what to do next.
The shape of this release: afdata encodes the contract
The current afdata line carries each rule into a concrete primitive:
- Suffix-aware formatters.
output_json,output_yaml, andoutput_plainknow the suffix vocabulary and render accordingly across Rust, Python, TypeScript, and Go. - Self-labeling secrets. Any
*_secretfield is redacted in every output mode. - Explicit redaction policy. Tools that handle legacy payloads can name redacted fields explicitly without changing the formatter call.
- Protocol template.
build_json_ok,build_json_error, andbuild_jsonproduce events withcode, optionaltrace, and the same naming rules applied to the body. - Structured logging. Each language ships a logger that emits
code: "log"events with span context, suffix-aware formatting, and the same redaction rules as data output. - Channel discipline. stdout is reserved for events. Stderr is for human prose. No tool needs to negotiate the boundary at runtime.
The change to internalize is not any one helper. It is the posture: structured data is the source of truth, and every reader — agent, human, formatter, logger — works from it.
The next direction: more meaning per name, less ad-hoc decoration
A naming convention is a long-term design space. Some next steps are clear:
- Wider suffix vocabulary. More units (
_pct,_rate), more domains (_e164,_iso4217), more anchors (_dur_msdistinct from_at), with conservative additions so the contract stays small enough to memorize. - Cross-language linting. A linter that flags fields named
timeoutorexpiryand proposes the suffixed alternative, runnable in CI and in the editor. - Schema bridges. Generators that turn a suffix-bearing struct into a JSON Schema, an OpenAPI shape, or a protobuf definition, so the convention can survive a hand-off to systems that still expect a schema.
- Policy beyond secrets. Field-level policies for retention, sampling, and audit, expressed in the same place redaction lives.
- A shared
error_codevocabulary. A short list of error codes used across the kit, so an agent that learnsconnect_timeoutfrom afhttp recognizes the same shape from afpsql or afpay.
The direction is not “make data prettier.” It is to push as much meaning as possible into the smallest place an agent already has to look — the field name — so the rest of its work can be reasoning rather than parsing.