Agent-First Data: A Structured-Data Layer Designed from the Agent Side

by Agent-First Kit Contributors

What a structured-data layer should look like when the primary reader is an agent: type-bearing field names, format-aware output, policy-driven redaction, and protocol events on stdout.

A human reading a config or a log line can guess. timeout: 5000 is probably milliseconds — or maybe seconds if it’s an old daemon. api_key: sk-1234… is obviously a secret to redact. size: 5242880 is some number of bytes, give or take. A person fills the gaps.

An agent cannot guess and stay correct. It sees 5000 and has to choose a unit; either choice ends an hour later in the wrong place. It sees sk-1234… and either knows it is a secret or it does not; if it does not, the value lands in a transcript that someone will paste somewhere. It sees 5242880 and reports “5.2 million units of size.”

Agent-First Data starts from that asymmetry:

What should a structured-data layer look like when the primary reader is an autonomous coding agent?

The premise: data should be self-describing

A schema solves the meaning problem, but it solves it offline. The agent needs to know what a field means at the point of reading it, with no second hop to a schema service, no JSON Schema sidecar, no field-by-field documentation it has to fetch.

So the meaning has to live in the data the agent already has in hand. The cheapest place to put it is the field name.

{
  "timeout_ms": 5000,
  "created_at_epoch_ms": 1738886400000,
  "file_size_bytes": 5242880,
  "api_key_secret": "sk-1234567890abcdef"
}

_ms is milliseconds. _epoch_ms is a Unix timestamp. _bytes is a byte count. _secret is a redaction policy. None of that needs to be looked up. An agent that reads this object knows the units, the time domain, and what not to log — from the keys alone.

This is the first afdata design rule: every field name should answer “what is this value” before the agent has to ask anywhere else.

The suffix rule: meaning lives in the name

The convention is a closed set of suffixes that bind to types and policies. Some name the unit (_ms, _s, _bytes, _count). Some name the domain (_url, _path, _email, _uuid). Some name a temporal anchor (_at for ISO 8601, _epoch_ms for milliseconds since 1970). And one — _secret — names a policy: never render this value in plain text, ever.

The agent does not need the whole vocabulary in advance. It needs to know that whenever it sees a *_ms field, the integer is milliseconds; whenever it sees *_secret, the value must be redacted before logging. The suffix is the contract.

This works in JSON, YAML, TOML, environment variables, database columns, and protobuf field names. The substrate does not matter; the naming is portable.

The rule, stated cleanly: the agent should never have to consult an external schema to learn the type or policy of a field it just received.

The format rule: same data, three audiences

A structured event has three plausible readers. The agent reads JSON because it is protocol. A human on call reads YAML because it scans well on a small screen. A terminal scrollback reads compact logfmt because it interleaves with other lines.

afdata treats the structured object as the source of truth and chooses a formatter at output time:

use agent_first_data::*;

let status = json!({
    "uptime_s": 86400,
    "memory_bytes": 1048576,
    "db_password_secret": "super-secret"
});

println!("{}", output_yaml(&status));
// ---
// db_password: "***"
// memory: "1.0MB"
// uptime: "86400s"

The same status value renders as a single JSONL line in output_json, a tabular logfmt line in output_plain, or the YAML above. Suffix-bearing keys are stripped of their suffix in the human formats (because the formatted value carries the unit), and _secret is honored everywhere.

The rule: the structured shape is canonical. Formatters serve different readers, but none of them are allowed to reinterpret the data or lose its meaning.

The redaction rule: redaction is policy, not formatting

A formatter that decides what to redact will get it wrong. The decision belongs to the data shape, not the renderer.

The default rule is the _secret suffix: any field whose name ends in _secret is replaced with "***" in every output mode. That covers the case where the tool author controls the field names.

Real systems also include payloads where the field cannot be renamed. A third-party API returns {"password": "..."}. A legacy table has a column called token. The v0.8 line added explicit policy for those:

let policy = RedactionPolicy::default()
    .with_secret_names(["password", "token"]);
println!("{}", output_yaml_with(&payload, &policy));

The policy lives outside the formatter call. If a tool defines a redaction policy at startup, every output call honors it. There is no “format this with redaction off” path — once a value is marked secret, the only safe operation is to render it as "***".

For the policy mechanism in detail, see the v0.8 redaction-policy post.

The rule: an agent should never accidentally log a secret because the formatter forgot.

The protocol rule: stdout is a structured channel

afdata is more than naming. It also defines a small protocol template: a JSONL event with a required code field and an optional trace.

{"code":"ok","result":{"hash":"abc123","size_bytes":456789},"trace":{"duration_ms":1280}}
{"code":"log","event":"startup","config":{"timeout_s":30}}
{"code":"error","error_code":"timeout","message":"upstream did not respond","trace":{"duration_ms":30001}}

code is the discriminator. The agent reads one line, branches on code, and handles the body without parsing English.

The discipline at the stream level: stdout carries only structured events. Stderr carries free human prose — startup banners, debug spew, panics. They never mix. An agent reading stdout never has to handle a sentence; a human reading stderr never has to handle JSON they did not ask for.

This is the contract that logs were brought into in v0.5. Logs are not a separate channel — they are events with code: "log", span fields, and the same suffix rules. A request_id set on a span travels with every event from that scope.

The rule: stdout is for the agent. Prose is for the human. They do not share a stream.

The error rule: failures carry stable codes

When a tool fails, it produces an event:

{"code":"error","error_code":"timeout","message":"upstream did not respond","retryable":true,"trace":{"duration_ms":30001}}

error_code is the stable handle. timeout, dns_failed, permission_denied, limit_exceeded. The agent branches on the code. message is the English version for humans reading logs; the agent does not depend on it. retryable is a hint the tool gives the agent about whether trying again is plausible.

A tool that builds errors with build_json_error("limit_exceeded", ...) carries this contract into its own protocol automatically. The error event is shaped like every other event — same code field, same trace, same redaction rules.

The rule: failure is data, not prose. An agent never has to grep an error message to decide what to do next.

The shape of this release: afdata encodes the contract

The current afdata line carries each rule into a concrete primitive:

The change to internalize is not any one helper. It is the posture: structured data is the source of truth, and every reader — agent, human, formatter, logger — works from it.

The next direction: more meaning per name, less ad-hoc decoration

A naming convention is a long-term design space. Some next steps are clear:

The direction is not “make data prettier.” It is to push as much meaning as possible into the smallest place an agent already has to look — the field name — so the rest of its work can be reasoning rather than parsing.