RFD-0028 — Diagnostics schema 1.0
Question
What is the wire-format shape of the agent-facing diagnostics payload, and when does it commit to SemVer 1.0?
Context
RFD-0027 ratified the layered architecture for agent integration: CLI floor, MCP transport, IDE extension, per-agent skills — all backed by oxc-agent-surface whose typed return values generate JSON Schemas under share/argon/schemas/. Each schema graduates to SemVer 1.0 via its own follow-up RFD; this is that follow-up for the diagnostics schema.
The diagnostics schema is the highest-traffic agent surface. Agents writing Argon code call argon_check continuously while iterating. The schema’s shape determines what an agent can do with a diagnostic: render it, propose a fix, jump to it, file an issue about it. Get this wrong and every consumer rebuilds the missing context locally; get it right and consumers compose cleanly.
The shape shipped at 0.1.0 (under RFD-0027 Phase 1) carries 9 fields per diagnostic plus 7 fields per span. The question is whether that shape is the shape, ready for a SemVer 1.0 commitment.
Decision
Ratify the diagnostics schema at SemVer 1.0.0 on merge of this RFD. The shape is the one defined by oxc_agent_surface::types::diagnostic::DiagnosticsReport at the merge SHA; further evolution follows SemVer (additive → minor, breaking → major).
share/argon/schemas/version.json adds "diagnostics" to its stable array on this RFD’s merge.
Schema shape (load-bearing)
DiagnosticsReport is the top-level object: schema_version: string, diagnostics: Diagnostic[], summary: DiagnosticsSummary.
Diagnostic is the per-diagnostic record:
| Field | Type | Notes |
|---|---|---|
code | string | Stable error code, e.g. "OE0226". Code prefix encodes severity. |
severity | "error" | "warning" | "info" | Mirrors the code prefix; surfaced explicitly so consumers don’t parse the string. |
message | string | One-line summary. |
primary_span | SpanRef | null | Null for spanless diagnostics (CLI-layer cross-format errors). |
primary_label | string | null | Optional label rendered at the primary span. |
secondary_labels | SecondaryLabel[] | Each carries its own span + message. |
help | string | null | Optional remediation hint. CLI appends try ox explain <code>; wire form does not. |
package_origin | string | null | Vocabulary package that authored the constraint. Compiler built-ins return null. |
provenance_chain | string[] | Why-chain from meta-property derivation; empty when not applicable. |
SpanRef carries both byte offsets and 1-indexed line/UTF-16 columns:
| Field | Type | Notes |
|---|---|---|
file | string | Workspace-relative path, normalized to forward slashes. |
byte_start | u32 | UTF-8 byte offset (0-indexed, inclusive). |
byte_end | u32 | UTF-8 byte offset (0-indexed, exclusive). |
line_start | u32 | 1-indexed line of the span’s first byte. |
col_start | u32 | 1-indexed UTF-16 column of the span’s first byte. |
line_end | u32 | 1-indexed line of the span’s last byte. |
col_end | u32 | 1-indexed UTF-16 column one past the span’s last character (exclusive end). |
DiagnosticsSummary carries errors: u32, warnings: u32, infos: u32 so consumers can summarize without iterating.
SecondaryLabel carries span: SpanRef + message: string.
Rationale
Why three-valued severity. Argon’s diagnostic codes already use a one-character severity prefix (E/W/I after the namespace character). Surfacing severity explicitly as an enum rather than parsing the string gives consumers structural access without coupling them to the code-prefix convention. Three levels match LSP’s DiagnosticSeverity 1/2/3 (Error/Warning/Information), making round-trip to LSP transports trivial.
Why both byte offsets and line/column. Renderers split into two camps: terminal/IDE renderers want line/column for cursor placement; programmatic consumers (refactor tools, jump-to-definition) want byte offsets to slice source text without re-parsing. Carrying both removes a class of off-by-one bugs at every consumer boundary.
Why UTF-16 columns specifically. LSP Position semantics use UTF-16 code units. Editors built atop LSP clients (VS Code, Cursor, Helix, neovim+coc) all assume UTF-16 columns. Picking anything else (UTF-8, codepoint) makes round-trip lossy at every IDE boundary. UTF-16 is the bad choice the world picked; we match.
Why 1-indexed lines/columns. Editor convention. Byte offsets stay 0-indexed (matching the compiler’s internal Span type) so the two coordinate systems remain visually distinguishable in the same payload.
Why package_origin is optional. Compiler built-in diagnostics (OE0001-class parse errors, type errors) don’t have a vocabulary package authoring them — they’re the language. Constraint-based diagnostics surface from a pub strict error rule in some package (UFO’s R01-R37, BFO’s continuant/occurrent disjointness, custom domain rules), and package_origin carries that package name so consumers can attribute / route / suppress per-package.
Why provenance_chain is string[]. The compiler’s meta-property calculus produces structured why-chains internally; for v0 we serialize them as strings rather than commit to a structured shape. Structuring the chain (axis names, derivation rule names, intermediate values) is its own design and would block ratification on more thinking. Strings are SemVer-safe additively: a future minor bump can introduce an optional provenance_structured: ProvenanceFrame[] | null field alongside the existing string array. Replacing the string array with a structured type — i.e. removing or renaming provenance_chain — is a major bump and out of scope for any minor evolution.
Why null for spanless / no-help / no-package — not absence. Schema consumers benefit from required fields with explicit null over optional fields whose presence is signal: a diagnostic that might have help but doesn’t is the same case whether help is missing-from-object or null. Forcing null in the wire format normalizes the parsing path. Required + nullable also generates cleaner TypeScript bindings (string | null vs string | undefined) for the eventual vscode-extension consumers.
Why ratify at 1.0 now rather than after #331 / #332 land. The shape is derived directly from oxc::diagnostics::OntologDiagnostic (legacy spelling — short for the pre-rename “Ontolog” language tag — and the canonical name of the internal compiler diagnostic struct), which has been stable since long before RFD-0027. The wire format adds nothing not already in the internal form — projecting Span to byte+line/col is mechanical, projecting Severity is identity, the rest is field-level rename. The risk of “consumers exercise the schema and find a missing field” is low because no compiler-side richness is being hidden. SemVer 1.0 commits to this shape — additive evolution stays minor; we can always bump major if we missed something.
Why this RFD is small. The schema description lives in types/diagnostic.rs; rustdoc and the generated JSON Schema are the source of truth. This RFD captures the rationale for committing to that shape, not a duplicate of the shape itself.
Consequences
share/argon/schemas/version.jsonstablearray gains"diagnostics"on merge. The actual file update lands as a follow-up commit on the agent-tooling workstream branch — the file is generated fromoxc_agent_surface::types::version::SchemaSetVersion::current(), which lives there. A trailing-newlinechorecommit on that branch (or directly onmainif the branch has merged) updatescurrent()to push"diagnostics"intostableand re-runsoxc-codegen emit.- Future changes to
DiagnosticsReport/Diagnostic/SpanRef/Severity/SecondaryLabel/DiagnosticsSummaryfollow SemVer:- Adding an optional field (default-on-deserialize) → minor bump.
- Renaming a field, removing a field, narrowing a type, changing semantics → major bump.
- Adding a new
Severityvariant → major bump (consumer enums break).
- Consumers (Cursor MCP, Claude Code skills, vscode-extension renderers, third-party harnesses) pin against the schema’s SemVer in
version.json. - The CI drift gate via
oxc-codegen checkcontinues to enforce that the published JSON Schema matches the Rust types byte-for-byte. Drift here means the wire format moved without a SemVer bump; CI fails.
Out of scope
- Hover, query-result, package-tree, provenance schemas — each gets its own ratification RFD as its underlying surface stabilizes.
- Structured
provenance_chain— punt to a future minor bump (additive optional field). - Code-action hints (LSP-style fix-it suggestions in the wire format) — punt to a future minor bump.