RFD-0033 — Sequenced test statements: mutate and cleanup in test blocks
Question
How does a test block exercise a pub mutation end-to-end — verify its require preconditions hold, that the do { } body’s effects reach the test ABox, that emitted events surface in post-saturation queries, that retracted individuals leave — and how do tests express ordered teardown that runs regardless of mid-test assertion failures?
Context
pub mutation is a first-class declaration form (D-064) with five clauses: require { <atom> } preconditions, do { <stmt>... } field updates and locally-bound individuals, retract { <pattern>... } removals, emit <expr> event emission, return <expr>. Each mutation produces axiom-events into the kernel’s bitemporal event log at production runtime; in tests, the runner has its own forked Knowledge ABox.
The Phase-B language redesign (vault, 2026-04-24, Move 1) locks the test context as { stmts } — imperative, source-ordered — with statements drawn from let / mutate / assert / cleanup. Today the test-runner grammar admits only let and assert, flattened during elaboration into parallel individuals: Vec<CoreTestIndividual> and assertions: Vec<CoreTestAssertion> vectors. There is no source-order preservation between them: every let materializes individuals into the ABox, the runner saturates once, then every assert evaluates against the post-saturation state.
The lease-story scene-test convention works around the missing mutate form by let-binding the mutation’s would-be emitted events as if the mutation had run, then asserting the post-state shape. That convention:
- Verifies the parameter types are constructible.
- Verifies emitted-event shapes match downstream consumers.
- Does not evaluate
requirepreconditions. - Does not verify
do { }field updates take effect. - Does not verify
emitclauses fire and produce the events the assertions claim are present. - Does not distinguish “mutation failed precondition” from “mutation succeeded but assertion is wrong”.
Customers writing legal / financial domain ontologies on top of Argon want a contract test for the mutation surface they’re shipping: when a modeler hands record_rent_payment an is_timely: false payment, the test should fail with a structured “precondition violated” signal rather than appearing to succeed because no require clause ever evaluated.
The redesign also names a cleanup { } block as the fourth test-statement form. Today the test runner has no notion of teardown. For v1 isolated-ABox tests the practical role of cleanup is structural separation + ordered post-main statements that run regardless of mid-test assertion failures, so modelers can express “exercise the operation, observe state, then exercise the teardown mutation, observe again” without losing teardown coverage to a mid-test assertion drift.
Proposal
Two new TestStmt variants — Mutate and Cleanup — and a structural shift in how the test runner consumes test bodies.
Statement grammar
The test body becomes a source-ordered sequence drawn from four statement kinds:
test "rent payment passes timeliness check" {
let p: RentPayment = {
paid_on: 2025-03-03,
amount: 9500,
period_label: "2025-03",
is_timely: true,
}
mutate record_rent_payment(p)
assert RentPayment(p)
assert tenant_balance(p.tenant) == 0
cleanup {
mutate retract_test_payment(p)
assert not RentPayment(p)
}
}
let and assert keep their current semantics. mutate and cleanup are new.
CoreTest shape change
CoreTest gains statements: Vec<CoreTestStmt> as the source-ordered statement list. The existing individuals: Vec<CoreTestIndividual> and assertions: Vec<CoreTestAssertion> vectors are derived views computed from statements for backwards compatibility; new code reads statements directly.
#![allow(unused)]
fn main() {
pub enum CoreTestStmt {
Let(CoreTestIndividual),
Mutate(CoreMutateCall),
Assert(CoreTestAssertion),
Cleanup(Vec<CoreTestStmt>),
}
pub struct CoreMutateCall {
pub mutation_id: u64,
pub args: Vec<CoreRuleAtom>,
pub span: Span,
}
}
Cleanup carries its own statement list — a cleanup { } block admits three statement kinds (let, mutate, assert); cleanup blocks do not nest. A test admits at most one cleanup block; it must be the last statement. Multiple cleanup blocks fire OE0240 MultipleCleanupBlocks; cleanup at non-last position fires OE0241 CleanupNotAtEnd; nested cleanup fires OE0242 NestedCleanup.
Elaboration
For each Mutate { path, args, span }:
- Resolve
<path>against the elaborator’sModuleScope(scope.local + .imports + .re_exports) — the same surface that resolves any cross-package imported item — and filter the resolvedSymbolInfotoSymbolKind::Mutation(already a distinct symbol-kind variant inoxc::elaborate::SymbolKind, sibling toQuery/Computation). The predicate-call resolver ineval_predicate_callis not the right path — mutations aren’t predicates and aren’t reachable from that surface. The resolvedSymbolInfo.idkeys intoCoreModule.mutationsfor theCoreMutationto bind on the resultingCoreMutateCall. Unknown name (or a name that resolves to a non-mutation symbol kind) firesOE0237 UnknownMutation. - Validate arg arity vs. parameter count. Mismatch fires
OE0238 MutationArgArityMismatch. - Validate each arg against the parameter’s declared type via the existing
let-binding type-resolution path. Mismatch firesOE0239 MutationArgTypeMismatch.
For Cleanup { stmts }: recursively elaborate each inner statement under the same elaboration context. Inner Cleanup is structurally rejected (OE0242 NestedCleanup); cleanup blocks don’t nest.
Runtime semantics
The test runner replaces its current “materialize-all → saturate-once → check-all” shape with a per-statement saturation loop.
Pre-loop:
- Fork
Knowledge. Materializeusing_framesandfixture.resolved. Run an initial saturation so frame + fixture facts are saturated before the first user statement runs.
Main loop, for each statement in statements (excluding the trailing Cleanup):
| Stmt | Runtime |
|---|---|
Let | Materialize the individual into the ABox via the existing materialize_individual() path. Re-saturate. |
Mutate(call) | Look up the mutation by id. Evaluate each require atom against the current post-saturation ABox. If any returns false, record a MutationPreconditionFailure failure and skip the mutation’s do / retract / emit clauses for this call; continue to the next statement against the unchanged ABox. If all require atoms pass: apply retract { }, apply do { } field updates and any do { let } local bindings, evaluate and insert each emit <expr> (see §Emit semantics below). Re-saturate. |
Assert | Evaluate the assertion against the current post-saturation ABox. Record pass / fail. Continue the loop on failure — the runner reports every assertion’s outcome; it does not halt on the first failure. |
After the main loop, run the Cleanup block (if present). Cleanup statements process the same way as main-loop statements with one difference: failures inside cleanup are tagged with a cleanup: true flag on the TestFailure so the runner output distinguishes “the operation failed” from “the teardown failed.” Cleanup runs regardless of whether main-body assertions failed. A test where main-body assertions all passed but cleanup fails is reported as failed.
Multiple mutate statements compose left-to-right with re-saturation between each, so a require in mutation B that queries a derived fact produced by A’s do update sees the post-A saturation state, not the pre-A state.
Propagation when A’s precondition fails. If A’s require returns false, A’s do / retract / emit clauses are skipped (see step 2 above) and the ABox stays unchanged. The runner does not halt the statement loop — it continues to B with the unchanged ABox. B’s require evaluates against pre-A state; if it holds, B applies normally. The modeler sees the test fail (A’s MutationPreconditionFailure is recorded) and can read every subsequent statement’s outcome. This matches the design choice for failed assertions (failed mid-test asserts also don’t halt). The cost is that a chained-mutation test where B depends on A’s effects will see B fail or behave unexpectedly when A is skipped — but the source ordering of the failures (A first, then B’s drift) makes the dependency obvious. Halting after A’s failure would hide downstream drift; running B preserves it.
The return <expr> value is discarded in v1. A future v2 may add let result = mutate <name>(<args>) to bind it.
Emit semantics
emit <expr> is the canonical event-insertion mechanism per D-064. Within a mutation, each emit clause’s expression is a value-position expression. Two shapes are common in practice:
- Constructor form —
emit RentPaid { lease, amount, when: today() }. The expression is a typed record literal naming an event-class constructor. - Path-reference form —
emit p, wherepwas bound earlier in the mutation’sdo { let p: ... = { ... } }block. The expression is a path that resolves to an existing individual.
At runtime — production or test — the semantics:
- Evaluate the emit expression against the post-
require, post-retract, post-doABox plus the mutation’s parameter bindings and anydo { let }local bindings. - Path-reference case — the expression resolves to an individual already in the ABox (the
do { let }binding materialized it). The runtime treats the emit as already-realized for ABox purposes; no fresh id is minted, no duplicate insertion. Production additionally publishes the existing individual into the axiom-event log. - Constructor case — the expression evaluates to a typed record value. The runtime mints a fresh individual id, inserts a new individual of the expression’s type with the record’s field values via the same
materialize_individual()path the runner uses forlet. Production additionally writes the corresponding axiom-event entry. - In both cases, the event individual is visible to subsequent saturation and to assertions in the test that follow the
mutatestatement.
There is no requirement that the emit target be a previously let-bound name. The earlier “pre-bind via let” workaround was a regression from the canonical semantic and is dropped. Tests that assert on emitted events — mutate record_rent_payment(p); assert RentPaid(p.lease, p.amount) — exercise the emit clause genuinely: the assertion only holds when the runtime actually evaluated and inserted (or re-realized) that event.
Sequential assertion ordering — explicit semantic shift
Today, tests behave as let* assert* with a single saturation between the two phases. After this change, asserts evaluate at their statement position with re-saturation between mutates. For tests that contain only let and assert statements, the observable behavior is unchanged: all lets materialize before the first assert, the ABox saturates, and assertions evaluate against the saturated state. Tests that mix mutate in get the per-statement saturation semantics.
This is a deliberate semantic shift, not an accident of implementation. Modelers should be able to read the test top-down and reason about it as a temporal sequence: “after the lease starts, this should hold; after the first payment, this other thing; after termination, neither.”
Diagnostic codes
Allocated in clusters 23x–24x (extending the existing OE022x metarel/decorator range):
OE0237 UnknownMutation— mutation name in amutatestmt doesn’t resolve.OE0238 MutationArgArityMismatch— arg count doesn’t match<mutation>parameters.OE0239 MutationArgTypeMismatch— arg’s resolved type isn’t<:the parameter’s declared type.OE0240 MultipleCleanupBlocks— a test declares more than onecleanup { }block.OE0241 CleanupNotAtEnd— thecleanup { }block is followed by another statement.OE0242 NestedCleanup— acleanup { }block contains anothercleanup { }.
Test-runtime failures (precondition violation, cleanup-tagged failures) surface as TestFailure entries on the existing SingleTestResult.failures channel, not as new diagnostic codes. They distinguish themselves via structured kind and cleanup fields on TestFailure.
Consequences
For modelers — what v1 catches.
- Existing-precondition violations. A test that hands
record_rent_paymentanis_timely: falsepayment now fails withMutationPreconditionFailurerather than silently passing because norequireclause ever evaluated. emitclauses. Asserting an emitted event after amutatestatement is now meaningful coverage of the emit clause: the runtime evaluates the emit expression, mints an event individual, inserts it into the ABox, and the assertion succeeds only if all of that happened. Previously assertions on emitted events were satisfied by the pre-binding regardless of whether the mutation ran.- Multi-mutation flows (composition).
mutate A(...); mutate B(...); assert <post-state>proves the chained effect, with re-saturation between steps so B’srequiresees A’s derived facts. - Teardown ordering.
cleanup { ... }runs after main-body statements regardless of mid-test assertion failures, so a teardown mutation + its post-state assertion are exercised even when an earlier assertion drifted.
What v1 does NOT catch — explicit v2 scope.
The v1 runtime is honest about which mutation clauses it evaluates against the test ABox vs. which it surfaces as MutationUnsupported failures so the modeler sees the gap rather than a silent pass:
requireclauses — fully evaluated. v1.do { let }clauses — type assertion lands; field assignments inside the let’s value expression do not yet bind. The let-bound name is visible to subsequent assertions only as a typed individual.do { x.field = expr }field-updates — surface asMutationUnsupported. A test that assertspost.field == ...after a field-update mutation will see the runner flag the gap explicitly. v2.retract { }clauses — surface asMutationUnsupportedper clause.mutate teardown(x); assert not Concept(x)reports the unsupported flag rather than silently passing or failing on the assertion. v2.emit Event { ... }constructor — mints a fresh event individual with literal field values. Computed field expressions ({ when: today() }) are skipped in the field-args extraction; v1 covers literal-typed emit args only.emit ppath-reference — no-op (the source individual is already in the ABox via thedo { let }that bound it). v1.- Dropped preconditions. A
record_rent_paymentwhoserequire { p.is_timely }line was deleted from source still appears to pass v1 tests — there’s no atom left to evaluate. Detecting “this mutation should have a precondition but doesn’t” requires a separatemutate_fails <name>(<bad-args>)assertion form. v2.
The v2 work plan: implement field-update propagation against KnowledgeStore, implement retract (ByAssertionId at minimum), evaluate do { let } value-expression bindings, lift emit’s field-arg evaluation to the full atom-value evaluator, ship mutate_fails. Each is a clean follow-up against the v1 surface this RFD lands.
For the existing scene-test convention. Tests that don’t use mutate or cleanup continue to materialize-and-assert as before. The lease-story scene_04 / scene_06 / scene_11 / scene_13 tests can opt into mutate incrementally; they don’t need to change to keep passing.
Sequential assertion semantics. Tests that contain only let and assert retain today’s observable behavior: a single saturation, then all assertions evaluated against the same post-saturation state. Tests that introduce mutate get re-saturation between statements. A test author who introduces a mutate and observes that an assertion that previously passed now fails should investigate whether the mutation legitimately changes the asserted state — the change is signal, not noise.
For the LSP InfoView. Hover on a mutate <name>(...) site can show the mutation’s parameter list + require / emit / do / retract clauses — actionable affordance for “what is this mutation contracting?” introspection. Cleanup blocks render as a folded region by default with their own statement count.
Implementation surface
Estimated 10–14 hours across:
oxc/src/syntax/kind.rs— newMUTATE_STMT+CLEANUP_STMTsyntax kinds;mutateandcleanupas contextual keywords in test bodies.oxc/src/cst/parser.rs— recognizemutate <path>(<args>)andcleanup { stmts }insidetest { }bodies; produceMUTATE_STMT/CLEANUP_STMTnodes.oxc/src/cst/lower/items.rs— lower toast::TestStmt::Mutate { path, args, span }andast::TestStmt::Cleanup { stmts, span }.oxc/src/ast.rs— newTestStmt::Mutate+TestStmt::Cleanupvariants.oxc/src/core_ir.rs— newCoreTestStmtenum andCoreMutateCallstruct; threadstatements: Vec<CoreTestStmt>ontoCoreTest. Keepindividuals+assertionsas derived views during the migration window.oxc/src/elaborate/phase_elaborate.rs::resolve_test— name resolution + arg validation; emitOE0237–OE0242. Recursive elaboration intoCleanupbody.oxc/src/diagnostics/codes.rs+rendering.rs— six new diagnostic helpers.oxc/src/test_runner.rs— replace the materialize-all → saturate-once → check-all shape with a per-statement loop. New helpers:apply_mutation(evaluates require, applies retract/do/emit, re-saturates),evaluate_emit_expr,mint_event_individual. Cleanup-taggedTestFailureflag.oxc/src/reasoning/atom_eval.rs— confirmeval_atom_truthcovers every atom shape that appears inrequire { }bodies; extend if any are missing.oxc/tests/test_framework_runner.rs— exercise: passing precondition, failing precondition, multi-mutate composition,emitinsertion + post-saturation visibility, retract + assert-not, cleanup runs after main-body failure, cleanup-only failures, unknown-mutation diagnostic, arg-mismatch diagnostic, multiple-cleanup diagnostic, cleanup-not-at-end diagnostic, nested-cleanup diagnostic.argon/examples/lease-story/packages/story-lease/src/tests/scene_04_security_deposit.ar— opt the existing test block intomutate record_security_deposit(...)to demonstrate the surface; add acleanupblock that exercises deposit-return.argon/book/src/— section in the test-runner chapter documenting the four-statement grammar + per-statement saturation + emit semantics + cleanup ordering.
Open questions
let result = mutate <name>(<args>)for return-value binding. Punting to v2 keeps v1’s surface minimum-viable. The dominant case (precondition + emit + assert) ships first; return-binding is additive.retractsemantics under saturation. When aretractremoves an individual, derived facts that depended on it must also be removed. v1 picks re-saturate-from-scratch for simplicity; cost is acceptable at test sizes (typically < 100 individuals). Incremental retraction is a future optimization.mutate_fails <name>(<args>)assertion form. A statement that asserts the precondition fails — useful for testing the negative case, and the path to v2 catching dropped-require regressions. Out of scope for v1.- Cleanup ordering vs. test-attribute interaction.
#[unproven]and#[assumed]apply to the test as a whole. Whether cleanup runs for#[assumed]tests (which the runner doesn’t actually evaluate) is a small semantic question; v1 picks “yes, cleanup runs” for symmetry with the other test-attribute paths, on the principle that cleanup should be observable side-effect-wise even when the test body’s assertions aren’t being checked.