RFD-0028 — Diagnostics schema 1.0

Committed Opened 2026-05-03 · Committed 2026-05-03

Question

What is the wire-format shape of the agent-facing diagnostics payload, and when does it commit to SemVer 1.0?

RFD-0027 ratified the layered architecture for agent integration: CLI floor, MCP transport, IDE extension, per-agent skills — all backed by oxc-agent-surface whose typed return values generate JSON Schemas under share/argon/schemas/. Each schema graduates to SemVer 1.0 via its own follow-up RFD; this is that follow-up for the diagnostics schema.

The diagnostics schema is the highest-traffic agent surface. Agents writing Argon code call argon_check continuously while iterating. The schema’s shape determines what an agent can do with a diagnostic: render it, propose a fix, jump to it, file an issue about it. Get this wrong and every consumer rebuilds the missing context locally; get it right and consumers compose cleanly.

The shape shipped at 0.1.0 (under RFD-0027 Phase 1) carries 9 fields per diagnostic plus 7 fields per span. The question is whether that shape is the shape, ready for a SemVer 1.0 commitment.

Decision

Ratify the diagnostics schema at SemVer 1.0.0 on merge of this RFD. The shape is the one defined by oxc_agent_surface::types::diagnostic::DiagnosticsReport at the merge SHA; further evolution follows SemVer (additive → minor, breaking → major).

share/argon/schemas/version.json adds "diagnostics" to its stable array on this RFD’s merge.

Schema shape (load-bearing)

DiagnosticsReport is the top-level object: schema_version: string, diagnostics: Diagnostic[], summary: DiagnosticsSummary.

Diagnostic is the per-diagnostic record:

Field	Type	Notes
`code`	`string`	Stable error code, e.g. `"OE0226"`. Code prefix encodes severity.
`severity`	`"error" \| "warning" \| "info"`	Mirrors the code prefix; surfaced explicitly so consumers don’t parse the string.
`message`	`string`	One-line summary.
`primary_span`	`SpanRef \| null`	Null for spanless diagnostics (CLI-layer cross-format errors).
`primary_label`	`string \| null`	Optional label rendered at the primary span.
`secondary_labels`	`SecondaryLabel[]`	Each carries its own span + message.
`help`	`string \| null`	Optional remediation hint. CLI appends `try ox explain <code>`; wire form does not.
`package_origin`	`string \| null`	Vocabulary package that authored the constraint. Compiler built-ins return null.
`provenance_chain`	`string[]`	Why-chain from meta-property derivation; empty when not applicable.

SpanRef carries both byte offsets and 1-indexed line/UTF-16 columns:

Field	Type	Notes
`file`	`string`	Workspace-relative path, normalized to forward slashes.
`byte_start`	`u32`	UTF-8 byte offset (0-indexed, inclusive).
`byte_end`	`u32`	UTF-8 byte offset (0-indexed, exclusive).
`line_start`	`u32`	1-indexed line of the span’s first byte.
`col_start`	`u32`	1-indexed UTF-16 column of the span’s first byte.
`line_end`	`u32`	1-indexed line of the span’s last byte.
`col_end`	`u32`	1-indexed UTF-16 column one past the span’s last character (exclusive end).

DiagnosticsSummary carries errors: u32, warnings: u32, infos: u32 so consumers can summarize without iterating.

SecondaryLabel carries span: SpanRef + message: string.

Rationale

Why three-valued severity. Argon’s diagnostic codes already use a one-character severity prefix (E/W/I after the namespace character). Surfacing severity explicitly as an enum rather than parsing the string gives consumers structural access without coupling them to the code-prefix convention. Three levels match LSP’s DiagnosticSeverity 1/2/3 (Error/Warning/Information), making round-trip to LSP transports trivial.

Why both byte offsets and line/column. Renderers split into two camps: terminal/IDE renderers want line/column for cursor placement; programmatic consumers (refactor tools, jump-to-definition) want byte offsets to slice source text without re-parsing. Carrying both removes a class of off-by-one bugs at every consumer boundary.

Why UTF-16 columns specifically. LSP Position semantics use UTF-16 code units. Editors built atop LSP clients (VS Code, Cursor, Helix, neovim+coc) all assume UTF-16 columns. Picking anything else (UTF-8, codepoint) makes round-trip lossy at every IDE boundary. UTF-16 is the bad choice the world picked; we match.

Why 1-indexed lines/columns. Editor convention. Byte offsets stay 0-indexed (matching the compiler’s internal Span type) so the two coordinate systems remain visually distinguishable in the same payload.

Why package_origin is optional. Compiler built-in diagnostics (OE0001-class parse errors, type errors) don’t have a vocabulary package authoring them — they’re the language. Constraint-based diagnostics surface from a pub strict error rule in some package (UFO’s R01-R37, BFO’s continuant/occurrent disjointness, custom domain rules), and package_origin carries that package name so consumers can attribute / route / suppress per-package.

Why provenance_chain is string[]. The compiler’s meta-property calculus produces structured why-chains internally; for v0 we serialize them as strings rather than commit to a structured shape. Structuring the chain (axis names, derivation rule names, intermediate values) is its own design and would block ratification on more thinking. Strings are SemVer-safe additively: a future minor bump can introduce an optional provenance_structured: ProvenanceFrame[] | null field alongside the existing string array. Replacing the string array with a structured type — i.e. removing or renaming provenance_chain — is a major bump and out of scope for any minor evolution.

Why null for spanless / no-help / no-package — not absence. Schema consumers benefit from required fields with explicit null over optional fields whose presence is signal: a diagnostic that might have help but doesn’t is the same case whether help is missing-from-object or null. Forcing null in the wire format normalizes the parsing path. Required + nullable also generates cleaner TypeScript bindings (string | null vs string | undefined) for the eventual vscode-extension consumers.

Why ratify at 1.0 now rather than after #331 / #332 land. The shape is derived directly from oxc::diagnostics::OntologDiagnostic (legacy spelling — short for the pre-rename “Ontolog” language tag — and the canonical name of the internal compiler diagnostic struct), which has been stable since long before RFD-0027. The wire format adds nothing not already in the internal form — projecting Span to byte+line/col is mechanical, projecting Severity is identity, the rest is field-level rename. The risk of “consumers exercise the schema and find a missing field” is low because no compiler-side richness is being hidden. SemVer 1.0 commits to this shape — additive evolution stays minor; we can always bump major if we missed something.

Why this RFD is small. The schema description lives in types/diagnostic.rs; rustdoc and the generated JSON Schema are the source of truth. This RFD captures the rationale for committing to that shape, not a duplicate of the shape itself.

Consequences

share/argon/schemas/version.json stable array gains "diagnostics" on merge. The actual file update lands as a follow-up commit on the agent-tooling workstream branch — the file is generated from oxc_agent_surface::types::version::SchemaSetVersion::current(), which lives there. A trailing-newline chore commit on that branch (or directly on main if the branch has merged) updates current() to push "diagnostics" into stable and re-runs oxc-codegen emit.
Future changes to DiagnosticsReport / Diagnostic / SpanRef / Severity / SecondaryLabel / DiagnosticsSummary follow SemVer:
- Adding an optional field (default-on-deserialize) → minor bump.
- Renaming a field, removing a field, narrowing a type, changing semantics → major bump.
- Adding a new Severity variant → major bump (consumer enums break).
Consumers (Cursor MCP, Claude Code skills, vscode-extension renderers, third-party harnesses) pin against the schema’s SemVer in version.json.
The CI drift gate via oxc-codegen check continues to enforce that the published JSON Schema matches the Rust types byte-for-byte. Drift here means the wire format moved without a SemVer bump; CI fails.

Out of scope

Hover, query-result, package-tree, provenance schemas — each gets its own ratification RFD as its underlying surface stabilizes.
Structured provenance_chain — punt to a future minor bump (additive optional field).
Code-action hints (LSP-style fix-it suggestions in the wire format) — punt to a future minor bump.

The Argon Programming Language