RFD-0033 — Sequenced test statements: `mutate` and `cleanup` in `test` blocks

Discussion Opened 2026-05-06 · Revised 2026-05-06

Question

How does a test block exercise a pub mutation end-to-end — verify its require preconditions hold, that the do { } body’s effects reach the test ABox, that emitted events surface in post-saturation queries, that retracted individuals leave — and how do tests express ordered teardown that runs regardless of mid-test assertion failures?

Context

pub mutation is a first-class declaration form (D-064) with five clauses: require { <atom> } preconditions, do { <stmt>... } field updates and locally-bound individuals, retract { <pattern>... } removals, emit <expr> event emission, return <expr>. Each mutation produces axiom-events into the kernel’s bitemporal event log at production runtime; in tests, the runner has its own forked Knowledge ABox.

The Phase-B language redesign (vault, 2026-04-24, Move 1) locks the test context as { stmts } — imperative, source-ordered — with statements drawn from let / mutate / assert / cleanup. Today the test-runner grammar admits only let and assert, flattened during elaboration into parallel individuals: Vec<CoreTestIndividual> and assertions: Vec<CoreTestAssertion> vectors. There is no source-order preservation between them: every let materializes individuals into the ABox, the runner saturates once, then every assert evaluates against the post-saturation state.

The lease-story scene-test convention works around the missing mutate form by let-binding the mutation’s would-be emitted events as if the mutation had run, then asserting the post-state shape. That convention:

Verifies the parameter types are constructible.
Verifies emitted-event shapes match downstream consumers.
Does not evaluate require preconditions.
Does not verify do { } field updates take effect.
Does not verify emit clauses fire and produce the events the assertions claim are present.
Does not distinguish “mutation failed precondition” from “mutation succeeded but assertion is wrong”.

Customers writing legal / financial domain ontologies on top of Argon want a contract test for the mutation surface they’re shipping: when a modeler hands record_rent_payment an is_timely: false payment, the test should fail with a structured “precondition violated” signal rather than appearing to succeed because no require clause ever evaluated.

The redesign also names a cleanup { } block as the fourth test-statement form. Today the test runner has no notion of teardown. For v1 isolated-ABox tests the practical role of cleanup is structural separation + ordered post-main statements that run regardless of mid-test assertion failures, so modelers can express “exercise the operation, observe state, then exercise the teardown mutation, observe again” without losing teardown coverage to a mid-test assertion drift.

Proposal

Two new TestStmt variants — Mutate and Cleanup — and a structural shift in how the test runner consumes test bodies.

Statement grammar

The test body becomes a source-ordered sequence drawn from four statement kinds:

test "rent payment passes timeliness check" {
    let p: RentPayment = {
        paid_on: 2025-03-03,
        amount: 9500,
        period_label: "2025-03",
        is_timely: true,
    }

    mutate record_rent_payment(p)

    assert RentPayment(p)
    assert tenant_balance(p.tenant) == 0

    cleanup {
        mutate retract_test_payment(p)
        assert not RentPayment(p)
    }
}

let and assert keep their current semantics. mutate and cleanup are new.

CoreTest shape change

CoreTest gains statements: Vec<CoreTestStmt> as the source-ordered statement list. The existing individuals: Vec<CoreTestIndividual> and assertions: Vec<CoreTestAssertion> vectors are derived views computed from statements for backwards compatibility; new code reads statements directly.

#![allow(unused)]
fn main() {
pub enum CoreTestStmt {
    Let(CoreTestIndividual),
    Mutate(CoreMutateCall),
    Assert(CoreTestAssertion),
    Cleanup(Vec<CoreTestStmt>),
}

pub struct CoreMutateCall {
    pub mutation_id: u64,
    pub args: Vec<CoreRuleAtom>,
    pub span: Span,
}
}

Cleanup carries its own statement list — a cleanup { } block admits three statement kinds (let, mutate, assert); cleanup blocks do not nest. A test admits at most one cleanup block; it must be the last statement. Multiple cleanup blocks fire OE0240 MultipleCleanupBlocks; cleanup at non-last position fires OE0241 CleanupNotAtEnd; nested cleanup fires OE0242 NestedCleanup.

Elaboration

For each Mutate { path, args, span }:

Resolve <path> against the elaborator’s ModuleScope (scope.local + .imports + .re_exports) — the same surface that resolves any cross-package imported item — and filter the resolved SymbolInfo to SymbolKind::Mutation (already a distinct symbol-kind variant in oxc::elaborate::SymbolKind, sibling to Query / Computation). The predicate-call resolver in eval_predicate_call is not the right path — mutations aren’t predicates and aren’t reachable from that surface. The resolved SymbolInfo.id keys into CoreModule.mutations for the CoreMutation to bind on the resulting CoreMutateCall. Unknown name (or a name that resolves to a non-mutation symbol kind) fires OE0237 UnknownMutation.
Validate arg arity vs. parameter count. Mismatch fires OE0238 MutationArgArityMismatch.
Validate each arg against the parameter’s declared type via the existing let-binding type-resolution path. Mismatch fires OE0239 MutationArgTypeMismatch.

For Cleanup { stmts }: recursively elaborate each inner statement under the same elaboration context. Inner Cleanup is structurally rejected (OE0242 NestedCleanup); cleanup blocks don’t nest.

Runtime semantics

The test runner replaces its current “materialize-all → saturate-once → check-all” shape with a per-statement saturation loop.

Pre-loop:

Fork Knowledge. Materialize using_frames and fixture.resolved. Run an initial saturation so frame + fixture facts are saturated before the first user statement runs.

Main loop, for each statement in statements (excluding the trailing Cleanup):

Stmt	Runtime
`Let`	Materialize the individual into the ABox via the existing `materialize_individual()` path. Re-saturate.
`Mutate(call)`	Look up the mutation by id. Evaluate each `require` atom against the current post-saturation ABox. If any returns false, record a `MutationPreconditionFailure` failure and skip the mutation’s `do` / `retract` / `emit` clauses for this call; continue to the next statement against the unchanged ABox. If all `require` atoms pass: apply `retract { }`, apply `do { }` field updates and any `do { let }` local bindings, evaluate and insert each `emit <expr>` (see §Emit semantics below). Re-saturate.
`Assert`	Evaluate the assertion against the current post-saturation ABox. Record pass / fail. Continue the loop on failure — the runner reports every assertion’s outcome; it does not halt on the first failure.

After the main loop, run the Cleanup block (if present). Cleanup statements process the same way as main-loop statements with one difference: failures inside cleanup are tagged with a cleanup: true flag on the TestFailure so the runner output distinguishes “the operation failed” from “the teardown failed.” Cleanup runs regardless of whether main-body assertions failed. A test where main-body assertions all passed but cleanup fails is reported as failed.

Multiple mutate statements compose left-to-right with re-saturation between each, so a require in mutation B that queries a derived fact produced by A’s do update sees the post-A saturation state, not the pre-A state.

Propagation when A’s precondition fails. If A’s require returns false, A’s do / retract / emit clauses are skipped (see step 2 above) and the ABox stays unchanged. The runner does not halt the statement loop — it continues to B with the unchanged ABox. B’s require evaluates against pre-A state; if it holds, B applies normally. The modeler sees the test fail (A’s MutationPreconditionFailure is recorded) and can read every subsequent statement’s outcome. This matches the design choice for failed assertions (failed mid-test asserts also don’t halt). The cost is that a chained-mutation test where B depends on A’s effects will see B fail or behave unexpectedly when A is skipped — but the source ordering of the failures (A first, then B’s drift) makes the dependency obvious. Halting after A’s failure would hide downstream drift; running B preserves it.

The return <expr> value is discarded in v1. A future v2 may add let result = mutate <name>(<args>) to bind it.

Emit semantics

emit <expr> is the canonical event-insertion mechanism per D-064. Within a mutation, each emit clause’s expression is a value-position expression. Two shapes are common in practice:

Constructor form — emit RentPaid { lease, amount, when: today() }. The expression is a typed record literal naming an event-class constructor.
Path-reference form — emit p, where p was bound earlier in the mutation’s do { let p: ... = { ... } } block. The expression is a path that resolves to an existing individual.

At runtime — production or test — the semantics:

Evaluate the emit expression against the post-require, post-retract, post-do ABox plus the mutation’s parameter bindings and any do { let } local bindings.
Path-reference case — the expression resolves to an individual already in the ABox (the do { let } binding materialized it). The runtime treats the emit as already-realized for ABox purposes; no fresh id is minted, no duplicate insertion. Production additionally publishes the existing individual into the axiom-event log.
Constructor case — the expression evaluates to a typed record value. The runtime mints a fresh individual id, inserts a new individual of the expression’s type with the record’s field values via the same materialize_individual() path the runner uses for let. Production additionally writes the corresponding axiom-event entry.
In both cases, the event individual is visible to subsequent saturation and to assertions in the test that follow the mutate statement.

There is no requirement that the emit target be a previously let-bound name. The earlier “pre-bind via let” workaround was a regression from the canonical semantic and is dropped. Tests that assert on emitted events — mutate record_rent_payment(p); assert RentPaid(p.lease, p.amount) — exercise the emit clause genuinely: the assertion only holds when the runtime actually evaluated and inserted (or re-realized) that event.

Sequential assertion ordering — explicit semantic shift

Today, tests behave as let* assert* with a single saturation between the two phases. After this change, asserts evaluate at their statement position with re-saturation between mutates. For tests that contain only let and assert statements, the observable behavior is unchanged: all lets materialize before the first assert, the ABox saturates, and assertions evaluate against the saturated state. Tests that mix mutate in get the per-statement saturation semantics.

This is a deliberate semantic shift, not an accident of implementation. Modelers should be able to read the test top-down and reason about it as a temporal sequence: “after the lease starts, this should hold; after the first payment, this other thing; after termination, neither.”

Diagnostic codes

Allocated in clusters 23x–24x (extending the existing OE022x metarel/decorator range):

OE0237 UnknownMutation — mutation name in a mutate stmt doesn’t resolve.
OE0238 MutationArgArityMismatch — arg count doesn’t match <mutation> parameters.
OE0239 MutationArgTypeMismatch — arg’s resolved type isn’t <: the parameter’s declared type.
OE0240 MultipleCleanupBlocks — a test declares more than one cleanup { } block.
OE0241 CleanupNotAtEnd — the cleanup { } block is followed by another statement.
OE0242 NestedCleanup — a cleanup { } block contains another cleanup { }.

Test-runtime failures (precondition violation, cleanup-tagged failures) surface as TestFailure entries on the existing SingleTestResult.failures channel, not as new diagnostic codes. They distinguish themselves via structured kind and cleanup fields on TestFailure.

Consequences

For modelers — what v1 catches.

Existing-precondition violations. A test that hands record_rent_payment an is_timely: false payment now fails with MutationPreconditionFailure rather than silently passing because no require clause ever evaluated.
emit clauses. Asserting an emitted event after a mutate statement is now meaningful coverage of the emit clause: the runtime evaluates the emit expression, mints an event individual, inserts it into the ABox, and the assertion succeeds only if all of that happened. Previously assertions on emitted events were satisfied by the pre-binding regardless of whether the mutation ran.
Multi-mutation flows (composition). mutate A(...); mutate B(...); assert <post-state> proves the chained effect, with re-saturation between steps so B’s require sees A’s derived facts.
Teardown ordering. cleanup { ... } runs after main-body statements regardless of mid-test assertion failures, so a teardown mutation + its post-state assertion are exercised even when an earlier assertion drifted.

What v1 does NOT catch — explicit v2 scope.

The v1 runtime is honest about which mutation clauses it evaluates against the test ABox vs. which it surfaces as MutationUnsupported failures so the modeler sees the gap rather than a silent pass:

require clauses — fully evaluated. v1.
do { let } clauses — type assertion lands; field assignments inside the let’s value expression do not yet bind. The let-bound name is visible to subsequent assertions only as a typed individual.
do { x.field = expr } field-updates — surface as MutationUnsupported. A test that asserts post.field == ... after a field-update mutation will see the runner flag the gap explicitly. v2.
retract { } clauses — surface as MutationUnsupported per clause. mutate teardown(x); assert not Concept(x) reports the unsupported flag rather than silently passing or failing on the assertion. v2.
emit Event { ... } constructor — mints a fresh event individual with literal field values. Computed field expressions ({ when: today() }) are skipped in the field-args extraction; v1 covers literal-typed emit args only.
emit p path-reference — no-op (the source individual is already in the ABox via the do { let } that bound it). v1.
Dropped preconditions. A record_rent_payment whose require { p.is_timely } line was deleted from source still appears to pass v1 tests — there’s no atom left to evaluate. Detecting “this mutation should have a precondition but doesn’t” requires a separate mutate_fails <name>(<bad-args>) assertion form. v2.

The v2 work plan: implement field-update propagation against KnowledgeStore, implement retract (ByAssertionId at minimum), evaluate do { let } value-expression bindings, lift emit’s field-arg evaluation to the full atom-value evaluator, ship mutate_fails. Each is a clean follow-up against the v1 surface this RFD lands.

For the existing scene-test convention. Tests that don’t use mutate or cleanup continue to materialize-and-assert as before. The lease-story scene_04 / scene_06 / scene_11 / scene_13 tests can opt into mutate incrementally; they don’t need to change to keep passing.

Sequential assertion semantics. Tests that contain only let and assert retain today’s observable behavior: a single saturation, then all assertions evaluated against the same post-saturation state. Tests that introduce mutate get re-saturation between statements. A test author who introduces a mutate and observes that an assertion that previously passed now fails should investigate whether the mutation legitimately changes the asserted state — the change is signal, not noise.

For the LSP InfoView. Hover on a mutate <name>(...) site can show the mutation’s parameter list + require / emit / do / retract clauses — actionable affordance for “what is this mutation contracting?” introspection. Cleanup blocks render as a folded region by default with their own statement count.

Implementation surface

Estimated 10–14 hours across:

oxc/src/syntax/kind.rs — new MUTATE_STMT + CLEANUP_STMT syntax kinds; mutate and cleanup as contextual keywords in test bodies.
oxc/src/cst/parser.rs — recognize mutate <path>(<args>) and cleanup { stmts } inside test { } bodies; produce MUTATE_STMT / CLEANUP_STMT nodes.
oxc/src/cst/lower/items.rs — lower to ast::TestStmt::Mutate { path, args, span } and ast::TestStmt::Cleanup { stmts, span }.
oxc/src/ast.rs — new TestStmt::Mutate + TestStmt::Cleanup variants.
oxc/src/core_ir.rs — new CoreTestStmt enum and CoreMutateCall struct; thread statements: Vec<CoreTestStmt> onto CoreTest. Keep individuals + assertions as derived views during the migration window.
oxc/src/elaborate/phase_elaborate.rs::resolve_test — name resolution + arg validation; emit OE0237–OE0242. Recursive elaboration into Cleanup body.
oxc/src/diagnostics/codes.rs + rendering.rs — six new diagnostic helpers.
oxc/src/test_runner.rs — replace the materialize-all → saturate-once → check-all shape with a per-statement loop. New helpers: apply_mutation (evaluates require, applies retract/do/emit, re-saturates), evaluate_emit_expr, mint_event_individual. Cleanup-tagged TestFailure flag.
oxc/src/reasoning/atom_eval.rs — confirm eval_atom_truth covers every atom shape that appears in require { } bodies; extend if any are missing.
oxc/tests/test_framework_runner.rs — exercise: passing precondition, failing precondition, multi-mutate composition, emit insertion + post-saturation visibility, retract + assert-not, cleanup runs after main-body failure, cleanup-only failures, unknown-mutation diagnostic, arg-mismatch diagnostic, multiple-cleanup diagnostic, cleanup-not-at-end diagnostic, nested-cleanup diagnostic.
argon/examples/lease-story/packages/story-lease/src/tests/scene_04_security_deposit.ar — opt the existing test block into mutate record_security_deposit(...) to demonstrate the surface; add a cleanup block that exercises deposit-return.
argon/book/src/ — section in the test-runner chapter documenting the four-statement grammar + per-statement saturation + emit semantics + cleanup ordering.

Open questions

let result = mutate <name>(<args>) for return-value binding. Punting to v2 keeps v1’s surface minimum-viable. The dominant case (precondition + emit + assert) ships first; return-binding is additive.
retract semantics under saturation. When a retract removes an individual, derived facts that depended on it must also be removed. v1 picks re-saturate-from-scratch for simplicity; cost is acceptable at test sizes (typically < 100 individuals). Incremental retraction is a future optimization.
mutate_fails <name>(<args>) assertion form. A statement that asserts the precondition fails — useful for testing the negative case, and the path to v2 catching dropped-require regressions. Out of scope for v1.
Cleanup ordering vs. test-attribute interaction. #[unproven] and #[assumed] apply to the test as a whole. Whether cleanup runs for #[assumed] tests (which the runner doesn’t actually evaluate) is a small semantic question; v1 picks “yes, cleanup runs” for symmetry with the other test-attribute paths, on the principle that cleanup should be observable side-effect-wise even when the test body’s assertions aren’t being checked.

Keyboard shortcuts

The Argon Programming Language