Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Tests and Verification

A test is a small named world the runtime sets up, makes claims about, and tears down. Argon has a dedicated test item kind, plus @[theorem]-tagged rules that the compiler preserves as theorem markers for a future mechanical-verification pipeline.

This chapter covers both.

A first test

test "a freshly-signed residential lease is active" {
    let alice: Tenant = {
        id: "alice-001",
        name: "Alice",
        contact_email: "alice@example.com",
    }
    let bob: Landlord = {
        id: "bob-001",
        name: "Bob",
        payout_account: "acct-bob",
    }
    let house: Property = {
        id: "p-1",
        address: "1200 University Ave",
    }
    let lease: ResidentialLease = {
        id: "lease-1",
        tenant: alice,
        landlord: bob,
        property: house,
        start_date: #2026-01-01#,
        end_date: #2027-01-01#,
        monthly_rent: 9500,
        bedrooms: 3,
    }

    assert ResidentialLease(lease)
    assert ActiveLease(lease)
}

Run it:

$ ox test
   Compiling lease-tutorial v0.1.0
   Running 1 test
test scene "a freshly-signed residential lease is active" ... ok

test result: ok. 1 passed; 0 failed; 0 unproven; 0 assumed; 0 ignored

Three things to notice.

test "string" not test name

Tests are named with string literals — they are descriptive narratives, not identifiers. This is intentional: a test description is what shows up in ox test output, in CI logs, and in failure reports. Make it readable. “a freshly-signed residential lease is active” beats signed_lease_active.

let for setup

The body’s let name : Type = { fields } form constructs an instance directly. Field-block syntax lets you supply each field by name; the compiler validates against the declared type. The instance is bound to the local name; you can refer to it in assert statements.

This is a different let from the one inside compute Form-2 bodies: that let binds an intermediate value during evaluation; this let constructs a concept instance for the test’s local world.

assert <atom> for claims

Inside a test, assert introduces a single rule-atom claim — a type test, a derive-head call, a comparison, an aggregate inequality, or any other shape the rule-body grammar admits. The runtime evaluates the atom against the test’s world; if it does not hold, the test fails.

assert here is post-reasoning: the runtime saturates the model first (running all asserts and derives), then checks each test claim against the saturated knowledge base. So assert ActiveLease(lease) works because the ActiveLease derive rule has fired during reasoning.

You can also write assert not <atom>:

test "a lease with start >= end is rejected by the validity refinement" {
    let bad: Lease = {
        id: "lease-bad",
        ...
        start_date: #2027-01-01#,
        end_date: #2026-01-01#,
        ...
    }

    assert not bad: ValidLease
}

not negates the atom — the test passes when the atom fails. Useful for exclusion claims like “this bad input does not pass our refinement.”

Status note: instantiates-form assertions in tests. The runtime’s assertion evaluator currently supports concept-as-predicate atoms (assert ResidentialLease(lease), assert ActiveLease(lease)) but does not yet evaluate the instantiates-form assert <ind>: <Concept> (and its negation) — running such a test reports atom shape not yet supported: Instantiates. The shape is parsed and elaborated and passes ox check / ox build; it just isn’t dispatched by ox test yet. Until that lands, prefer assert <Concept>(<ind>) for membership claims; refinement-membership tests will roll in alongside the runtime extension.

Sequenced statements: mutate and cleanup

Tests aren’t only declarative. A test can also exercise a pub mutation against the test ABox, observe its effects, and run ordered teardown. The test body admits four statement kinds in source order: let, assert, mutate, and cleanup.

test "rent payment passes timeliness check" {
    let p: RentPayment = {
        paid_on: 2025-03-03,
        amount: 9500,
        period_label: "2025-03",
        is_timely: true,
    }

    mutate record_rent_payment(p)

    assert RentPayment(p)
    assert tenant_balance(p.tenant) == 0

    cleanup {
        mutate retract_test_payment(p)
        assert not RentPayment(p)
    }
}

mutate <name>(<args>) invokes a pub mutation. The runner:

  1. Resolves the mutation by name through the same import surface as any other cross-package symbol.
  2. Evaluates each require { } precondition against the current post-saturation ABox. On the first false, the runner records a structured MutationPreconditionFailure and skips the mutation’s do / retract / emit clauses; subsequent statements continue against the unchanged ABox so you see all assertion drift in one run.
  3. Otherwise applies retract, do (field updates and locally-bound let individuals), and emit (events become individuals visible to subsequent saturation).
  4. Re-saturates. The next assert evaluates against the post-mutation state.

cleanup { stmts } is a single, trailing teardown block. Cleanup admits the same four statement kinds and runs regardless of whether main-body assertions failed. Failures inside cleanup are tagged with a cleanup: true flag on the runner output so you can distinguish “the operation failed” from “the teardown failed.”

Well-formedness rules:

  • A test admits at most one cleanup block (OE0240 MultipleCleanupBlocks if more).
  • cleanup must be the last statement (OE0241 CleanupNotAtEnd if anything follows it).
  • Cleanup blocks don’t nest (OE0242 NestedCleanup).

What v1 catches

Before this surface existed, the only way to “test” a mutation was the scene-shape convention: let-bind the mutation’s emitted events as if it had run, then assert the post-state shape. That convention proved input shapes were constructible but never evaluated require, never exercised do, never verified that emit clauses fired.

mutate <name>(<args>) closes those gaps for the clauses v1 fully evaluates:

  • Existing-precondition violations. A test that hands record_rent_payment an is_timely: false payment fails with MutationPreconditionFailure rather than silently passing.
  • emit clauses (constructor case). emit Event { ... } evaluates the expression and mints a fresh event individual with the literal field values. An assert Event(...) after the mutate statement is meaningful coverage of the emit clause.
  • Multi-mutation flows. mutate A(...); mutate B(...); assert <post-state> proves the chained effect with re-saturation between steps so B’s require sees A’s derived facts.
  • Cleanup ordering. A cleanup { } block runs regardless of mid-test assertion failures, so a teardown mutation’s post-state assertion is exercised even when an earlier assertion drifted.

What v1 does NOT catch

The runner is honest about which clauses it does not yet evaluate against the test ABox — those clauses surface as MutationUnsupported test failures so the modeler sees the gap rather than a silent pass:

  • do { x.field = expr } field-updates. v1 emits a MutationUnsupported failure per field-update. A test like mutate update_balance(acct, 100); assert acct.balance == 100 will see the runner flag the gap explicitly. v2 scope.
  • retract { } clauses. v1 emits a MutationUnsupported failure per retract. mutate teardown(x); assert not Concept(x) reports the unsupported flag rather than silently passing or failing the assertion. v2 scope.
  • do { let } value-expression bindings. The let’s type assertion lands and the binding is visible to subsequent statements as a typed individual, but the field assignments inside the let’s value expression do not yet bind. v2 scope.
  • emit p path-reference (no-op). When the elaborator lowers emit p (where p is a do { let }-bound name) the resulting CoreEmit carries empty args. v1 treats this as a no-op since p is already in the ABox via the let. Constructor-form emits are the meaningful coverage path; path-reference assertions test the let-binding shape, not the emit clause itself.
  • Dropped preconditions. A record_rent_payment whose require { p.is_timely } line was deleted from source still appears to pass — there’s no atom left to evaluate, so no MutationPreconditionFailure ever fires. Detecting “this mutation should have a precondition but doesn’t” requires a separate mutate_fails <name>(<args>) form that proves a precondition fails. v1 catches violations of preconditions the modeler kept; v2 will catch dropped-precondition regressions.

Multi-mutate failure propagation

When mutation A’s require fails, A’s do / retract / emit are skipped and the ABox stays unchanged. The runner does not halt the statement loop — it continues to the next statement. A subsequent mutate B(...) runs with B’s require evaluating against the pre-A state; if B’s preconditions hold against that state, B applies normally.

This means a chained-mutation test where B depends on A’s effects will see B fail or behave unexpectedly when A is skipped — but the source ordering of failures (A first, then B’s drift) makes the dependency obvious. The alternative — halting after A’s failure — would hide downstream drift.

Ordering

Tests with only let and assert statements observe today’s behavior unchanged: lets materialize, the ABox saturates, assertions evaluate. Tests that introduce mutate get per-statement saturation — re-saturation runs between mutate statements so each mutation’s require sees the previous mutation’s derived facts. A test author who introduces a mutate and observes that an assertion that previously passed now fails should investigate whether the mutation legitimately changes the asserted state — the change is signal, not noise.

Failed assertions don’t halt the loop. The runner records every assertion’s outcome and continues, so you see all failures in one run.

Compute equality

A specifically-supported test-only form: assert that a compute call equals an expected value.

test "deposit return arithmetic" {
    let r: DepositReturn = {
        deposit_held_amount: 9500,
        allowed_deduction_total: 1400,
    }

    assert deposit_after_deductions(r) == 8100
}

The compute is called against the test’s world; the result is compared with the expected value. This is how you validate that arithmetic-heavy computes do what you mean.

Multiple tests in a module

A test block is an item, like any other. Several can live in the same file:

use lease::*
use party::*
use rules::*
use types::*

test "active lease is queryable" { ... }
test "expired lease is queryable" { ... }
test "validity refinement catches bad dates" { ... }
test "validity refinement catches non-positive rent" { ... }

Convention: put tests in their own module — typically src/tests.ar for a small package. The book’s running tutorial follows this convention.

Frames and fixtures

When several tests share setup — the same Tenant, the same Property, the same baseline rules — pull that setup into a frame. A frame is a named, composable bundle of TBox + rules + ABox that any test can opt into.

frame canonical_party {
    let alice: Tenant = {
        id: "alice-001",
        name: "Alice",
        contact_email: "alice@example.com",
    }
    let bob: Landlord = {
        id: "bob-001",
        name: "Bob",
        payout_account: "acct-bob",
    }
    let house: Property = {
        id: "p-1",
        address: "1200 University Ave",
    }
}

A frame’s body admits any top-level item form — concept declarations, rules, asserts, lets. pub frame exports across packages; bare frame is package-local.

Tests compose frames with using:

test "a freshly-signed lease over the canonical party is active" using canonical_party {
    let lease: ResidentialLease = {
        id: "lease-1",
        tenant: alice,
        landlord: bob,
        property: house,
        start_date: #2026-01-01#,
        end_date: #2027-01-01#,
        monthly_rent: 9500,
        bedrooms: 3,
    }

    assert ResidentialLease(lease)
    assert ActiveLease(lease)
}

using pulls every binding, rule, and assertion the frame defines into the test’s world. Multiple frames compose: using canonical_party, baseline_rules brings both. The same name declared in two composed frames is a conflict (OE0215); a using reference that does not resolve to a known frame is OE0214; an inline fixture that redeclares a frame member is OE0216; circular frame includes are OE0218.

A worked multi-frame example:

frame canonical_party {
    let alice: Tenant = { id: "alice-001", name: "Alice", contact_email: "alice@example.com" }
    let house: Property = { id: "p-1", address: "1200 University Ave" }
}

frame baseline_rates {
    let standard_rent: Money = 9500
    let standard_deposit: Money = 19000
}

test "standard-rate lease matches baseline" using canonical_party, baseline_rates {
    let lease: ResidentialLease = {
        id: "lease-1",
        tenant: alice,
        property: house,
        monthly_rent: standard_rent,
        start_date: #2026-01-01#,
        end_date: #2027-01-01#,
        bedrooms: 3,
    }

    assert lease.monthly_rent == standard_rent
}

Variants: _catalog/test-frame-declaration/{minimal,multi-frame,frame-with-fixtures}/ and _catalog/test-using-clause/{multi-using,composition-with-expect}/.

Status note: OE0216 not yet emitted. OE0216 is registered in the diagnostic catalog and committed by the rule above, but the emit-site at the inline-fixture-vs-frame-composition junction is not yet wired in the elaborator. Tests that exercise the collision (an inline fixture redeclaring a name a using-pulled frame already contributes) currently compile clean. Sibling diagnostics OE0214 / OE0215 / OE0218 are wired. Tracked at sharpe-dev/orca-mvp#411.

Inside a test, fixture { … } declares a test-local mini-module — items declared in the fixture are visible in the surrounding test but never escape to the parent module:

test "deposit return arithmetic with a custom property" {
    fixture {
        let custom: Property = {
            id: "custom-1",
            address: "42 Galaxy Way",
        }
    }

    let r: DepositReturn = {
        deposit_held_amount: 9500,
        allowed_deduction_total: 1400,
    }

    assert deposit_after_deductions(r) == 8100
}

Diagnostics that arise inside a fixture block are captured — they are visible to the test’s expect block (below) but never propagate to the parent module’s diagnostic stream. This lets a test deliberately exercise a malformed fixture and assert the right error fires, without the malformation breaking unrelated tests in the same package.

Diagnostic expectations

Tests can verify the compiler’s diagnostics, not just post-saturation fact-claims. The mechanism is the expect { … } block paired with //~ <label> source markers:

test "a sortal without a Kind ancestor is flagged" {
    fixture {
        pub kind Person { name: String }
        //~ orphan_role
        pub role Orphan { }
    }

    expect {
        diagnostic OW0207 at orphan_role
    }
}

//~ <label> anchors a labelled marker to the next source line — the convention follows Rust’s compiletest. Inside expect { … }, the bare form diagnostic <code> requires that exactly one diagnostic with that code fires somewhere in the test’s scope; at <marker> constrains the match to the labelled span; severity:<level> (one of error, warning, info) further constrains. Severity defaults to error when omitted.

expect {
    diagnostic OE0211 at line_a
    diagnostic OE0212 severity:warning
    diagnostic OE0605
}

The test passes when every claim in the expect block matches. An unmatched claim, an unexpected diagnostic above the configured severity floor, or a diagnostic anchored at the wrong span is a test failure; the runner surfaces the diff between expected and actual.

expect { … } and assert <atom> compose. A single test can verify both that a fixture produces the right diagnostic and that the rest of the model still satisfies its post-saturation claims.

test "bad lease shape flags + good lease still classifies" {
    fixture {
        //~ bad_lease
        let bad: Lease = { id: "x", start_date: #2027-01-01#, end_date: #2026-01-01# }
    }

    let good: Lease = {
        id: "y",
        start_date: #2026-01-01#,
        end_date: #2027-01-01#,
        monthly_rent: 9500,
    }

    expect {
        diagnostic OE0606 at bad_lease    // refinement violation on bad
    }

    assert ValidLease(good)               // good still satisfies the refinement
}

Variants: _catalog/test-expect-block/{minimal-expect,multi-arm-expect,with-fixture,expect-with-query}/.

meta() reflection and :: classification

Two related forms surface the metatype of a concept inside a rule body, query, or assertion. meta(X) returns X’s metatype as a value:

pub derive is_kind(x) :- meta(x) == kind

Person :: Kind is sugar for the same query — read “Person is a Kind”. The two lower to the same internal atom; pick whichever reads more naturally in context.

pub derive equiv(x) :- x :: Kind

meta() and :: classify types, not instances. A bound variable or named concept fits. A free unbound variable does not — meta(?z) with no binding fires OE0212 MetaArgumentNotGround at elaboration. Negation composes: not meta(x) == role is well-formed.

Tests use these forms to assert metatype-shape directly:

test "Tenant is a role" {
    assert Tenant :: Role
}

@[theorem] — mechanical verification

Some claims are not test-shaped. They are statements about the rule system itself: “this property holds for every possible lease, not just the three I happen to construct in a test.” For those, mark a derive rule with @[theorem]:

@[theorem]
pub derive transitive_subtype(x: ⊤, y: ⊤, z: ⊤) :-
    x specializes y,
    y specializes z
    => x specializes z

@[theorem] marks the rule as a theorem claim. At the current cut, the marker is preserved through compilation and surfaces in the kernel’s runtime so a future verification pipeline can target it; the compile-time mechanical-verification pass (the saturator + meta-property calculus attempting to prove that the rule’s body always implies its head) is not yet active. Today the value is signalling intent — the body of @[theorem]-tagged rules is the inventory the verification pass will close over when it lands.

@[theorem] is restricted to derive items. Placing it on a compute, mutation, query, or anything else yields OW0821 (“@[theorem] applied to a non-derive item”); declaring it twice on the same derive yields OW0822.

Two related attributes come from the same family:

  • #[unproven] — applied to a test block. Marks a theorem-shaped claim that the test asserts as true without mechanical verification. The test runs as usual; the runner reports it under its own status category so the build’s body of unproven claims is visible at a glance.
  • #[assumed] — applied to a test block. The body’s claim is injected as a postulate within the test’s scope. Used when a real-world axiom (e.g., “every SSN is unique”) is needed for downstream reasoning but cannot be derived from the model. Reported under its own status category for the same reason.
#[unproven]
test "subtyping is transitive across the entire lattice" {
    /* claim — not yet mechanically verified */
}

#[assumed]
test "every SSN is unique" {
    assert forall ?p1, ?p2: Person where ?p1.ssn == ?p2.ssn => ?p1 == ?p2
}

@[…] AttrList and @<ident> / #<ident> Decorator forms are accepted as alternates to #[…] for both attributes. Pick the form that matches the package’s prevailing style.

What ox test runs

ox test runs every test block in the project. @[theorem] markers are surfaced at ox check / ox build time (today as preserved markers; the verification pass is forthcoming), not under ox test — so a (future) verified theorem will be a property of the build, not of an individual test invocation.

The runner reports five status categories: Pass (every claim held), Fail (a claim did not hold), Unproven (a claim marked #[unproven] ran without proof), Assumed (a claim marked #[assumed] was injected as a postulate), Ignored (the test was excluded from the run). The categories are surfaced as separate counts in the test result: line so a build’s body of unproven and assumed claims stays visible.

$ ox test
   Compiling lease-tutorial v0.1.0
   Running 5 tests

test result: ok. 3 passed; 0 failed; 1 unproven; 1 assumed; 0 ignored

Pass -p <package> to scope to a single package within a workspace. Pass --filter <pattern> to run only tests whose names match. Pass -v / --verbose to show every assertion, not just failures.

Edge cases worth knowing

  • let : Type = { fields } is the canonical creation form inside tests. It runs the same field-validation the type-checker runs on parameter passing; bad field types fail at ox check, not at ox test.
  • Tests run with full reasoning. Unlike the default ox check, tests trigger the reasoner so that derives and asserts have fired before tests’ assert claims are evaluated.
  • Assertions in tests are not the same as assert items at module scope. A module-scope assert declares an invariant over the package; a test’s assert is a single rule-atom check inside the test’s world.

Putting it in the running example

Add src/tests.ar. Bundle the shared party-and-property setup into a frame so each test’s body stays focused on what it actually exercises:

use lease::*
use party::*
use rules::*
use types::*

frame canonical_party {
    let alice: Tenant = {
        id: "alice-001",
        name: "Alice",
        contact_email: "alice@example.com",
    }
    let bob: Landlord = {
        id: "bob-001",
        name: "Bob",
        payout_account: "acct-bob",
    }
    let house: Property = {
        id: "p-1",
        address: "1200 University Ave",
    }
}

test "a freshly-signed residential lease is active" using canonical_party {
    let lease: ResidentialLease = {
        id: "lease-1",
        tenant: alice,
        landlord: bob,
        property: house,
        start_date: #2026-01-01#,
        end_date: #2027-01-01#,
        monthly_rent: 9500,
        bedrooms: 3,
    }

    assert ResidentialLease(lease)
    assert ActiveLease(lease)
    assert lease: ValidLease
}

test "a lease with start >= end is rejected by the validity refinement" using canonical_party {
    let bad: Lease = {
        id: "lease-bad",
        tenant: alice,
        landlord: bob,
        property: house,
        start_date: #2027-01-01#,
        end_date: #2026-01-01#,
        monthly_rent: 9500,
    }

    assert not bad: ValidLease
}

Then:

$ ox test
   Compiling lease-tutorial v0.1.0
   Running 2 tests
test "a freshly-signed residential lease is active" ... ok
test "a lease with start >= end is rejected by the validity refinement" ... ok

test result: ok. 2 passed; 0 failed; 0 unproven; 0 assumed; 0 ignored

The model is now self-validating. Bad inputs do not survive the test pass.

Summary

Tests are first-class — test "string-name" { let setup; assert claims }. Frames bundle setup so tests share it via using. Fixtures scope test-local items and capture their diagnostics. expect { diagnostic … } blocks paired with //~ <label> markers verify the compiler’s output, not just post-saturation facts. meta() and :: surface metatype-shape directly. @[theorem], #[unproven], and #[assumed] provide a graded set of verification claims, from informally-tested through mechanically-verified through axiomatic, and the runner reports them as five distinct status categories. The lease tutorial now has tests covering the happy path and the failure modes that the refinement type catches. Next we render the model.