Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

RFD-0039 — Collections and collection operators

Committed Opened 2026-05-13 · Committed 2026-05-15

Implementation status (2026-05-14)

The substrate ships in two phases. Phase 1 lands the parts that are correct, testable, and self-contained without a multi-pass type- inference layer; Phase 2 lands the elaboration desugaring and runtime evaluators that depend on receiver type inference.

PhaseTracksWhat lands
Phase 1 (this RFD’s initial PR)A, B, C, D, E, J, K, partial NRFD ratified · grammar surface for all five RFD-0039 expression forms · std::collection::{Set,List,Map,Optional} + std::math::Range symbol registration with v1 op surface populated as Computation symbols · DataValue / DataValueType carry Set / Map / Optional / Range runtime variants · TypeExpr::Generic / Collection / OptionalCollection resolve to the matching DataValueType · OE2401, OE2402, OE2403–OE2408, OW2401–OW2402 reserved with explanations · 22-case substrate integration harness
Phase 2 (follow-up)F, G, H, I, L, MUFCS method-call desugaring with receiver type inference · per-context tier enforcement · crates/nous/src/runtime/std_collection_eval.rs Rust evaluators backing every v1 op · book chapter ch02-08-collections.md · examples/collections-tour/ end-to-end · lease-story Building/Unit/tenants extension

Phase 2 is tracked at issue #391 (E3 epic) with the deferred-op catalog at issue #409. Code written today using the RFD-0039 expression surface parses cleanly and surfaces OE2402 from the elaborator — the diagnostic carries the workaround (qualified std::collection::List::size(xs) form) and the tracking link.

Question

Argon today admits collection-shaped type expressions (Set[T], List[T], Map[K, V], Optional[T], [T; bound]) at parse time, and DataValueType already carries List, Set, Map at the runtime level. But the symbols those type expressions resolve against are not registered, the runtime has no operations on them, and no expression-level surface exists for indexing, slicing, membership testing, or higher-order transformations. Modelers cannot write the conditions and transformations real ontologies require.

What surface — substrate primitives, expression operators, and standard-library operations — closes the collection story without violating foundation-neutrality (RFD-0002) and the no-parametric-concepts position (RFD-0009)?

Context

Three feedback items from the May 2026 Argon Open Questions exchange are addressed here:

  • #26 — Collection operations (priority:blocker). Modelers cannot write is_member, append, remove, union, intersect at the expression level.
  • #20 — Ordered collections. Today’s [T; bound] admits cardinality constraints but doesn’t distinguish ordered from unordered semantics.
  • #21 — Optional syntax. spouse: [Person; <=1] reads as a singleton-bounded set instead of an optional value.

The current state is well-scoped:

  • The CST parser produces TypeExpr::Generic { head, args } for Set[T] / List[T] / Map[K, V], TypeExpr::Optional for T?, TypeExpr::Collection for [T; bound], and Expr::List for list literals. None of these are blocked in the grammar.
  • phase_elaborate.rs already computes stable_id("std::collection::Set") / List / Map and dispatches TypeExpr::Generic to DataValueType::Set / List / Map — but no symbols are registered against those stable ids, so the resolution silently degrades to concept refs.
  • DataValueType in nous already has List(Box<Self>), Set(Box<Self>), Map(Box<Self>, Box<Self>). DataValue has List(Vec<Self>) but lacks Set, Map, Optional, Range.
  • Aggregate forms (sum(expr for v in path where pred)) work in query heads and compute bodies; they are explicitly disallowed in rule bodies via OE0403.
  • No expression-level method-call syntax exists today. x.foo(bar) fails to parse (is_call_ahead only fires for bare identifiers, not post-dot).

The remaining work is connecting the parser surface to a populated std::collection module, adding the missing expression forms (method-call, indexing, slicing, membership, comprehension), and wiring the runtime to evaluate them.

RFD-0009 forbids user-declared parametric concepts. The implication: Set, List, Map, Optional, and Range cannot be user-declared and cannot be modeled as ordinary concepts with type parameters. They must live as a closed set of substrate-level type constructors. User code consumes them; user code does not extend them. Future user-declarable parametric types live behind a functor-module path that this RFD does not commit to.

Decision

1. Five built-in type constructors

std::collection exposes four type constructors; std::math exposes a fifth:

ConstructorArityModule pathID range
Set[T]1std::collection::SetTYPE_CTOR_SET_ID = 0x0003_0000_0000_0000
List[T]1std::collection::ListTYPE_CTOR_LIST_ID = 0x0003_0000_0000_0001
Map[K, V]2std::collection::MapTYPE_CTOR_MAP_ID = 0x0003_0000_0000_0002
Optional[T]1std::collection::OptionalTYPE_CTOR_OPTIONAL_ID = 0x0003_0000_0000_0003
Range[T]1std::math::RangeTYPE_CTOR_RANGE_ID = 0x0003_0000_0000_0004

The id range 0x0003_0000_0000_0000..0x0003_0000_0000_00FF is reserved for built-in type constructors, parallel to the 0x0002_* primitive-sentinel range. Type constructors are parametric and therefore have no DataValueType round-trip through primitive_sentinel_for; they resolve through a separate type_ctor_from_id function.

A new SymbolKind::TypeConstructor variant labels these symbols in the symbol table. The substrate cannot ship a new type constructor without a code change; the closed set is the price of foundation-neutrality (no metatype machinery decides what List means).

2. Surface — generic types use brackets, T? is sugar

Argon’s existing convention reserves < and > for comparison operators. Generic types therefore use brackets:

items:    List[Item]
tags:     Set[Text]
owners:   Map[Tenant, Lease]
discount: Optional[Money]
band:     Range[Money]

T? is admitted as user-facing sugar for Optional[T] and lowers identically at the AST→CoreIR boundary. [T] and [T; bound] continue to serve relation-shaped multi-valued field declarations; both lower to Set[T] by default, and an explicit ordered flag ([T; bound, ordered]) selects List[T]. [T; <=1] triggers OW2402 suggesting rewrite to T?.

3. Expression-level surface

Six new expression forms, all desugaring to ordinary calls at elaboration time:

  • Method callxs.m(a, b) desugars to <TypeOf(xs)>::m(xs, a, b). Dispatch is at elaboration time; no traits, no runtime dispatch, no method tables.
  • Indexingxs[i] desugars to <TypeOf(xs)>::at(xs, i) returning Optional[T]. Receivers typed as Set[T] reject with OE2406.
  • Slicingxs[i..j], xs[i..], xs[..j] desugar to <TypeOf(xs)>::slice(xs, range). Returns List[T].
  • Range literali..j constructs Range[T]::new(i, j) (half-open); i..=j constructs Range[T]::inclusive(i, j).
  • Membership operatorx in xs desugars to <TypeOf(xs)>::contains(xs, x); x not in xs negates.
  • Comprehension[expr for x in xs where pred] desugars to xs.filter(<pred>).map(<expr>) inline. The binding subgrammar reuses the existing AGGREGATE_BINDING + AGGREGATE_WHERE productions.

Higher-order arguments accept two forms:

  • Compute references — a bare pub compute name in an argument position is a value of compute-reference type. Elaboration validates the reference’s signature against the higher-order parameter’s expected shape; mismatch fires OE2407. CoreValue::ComputeRef(compute_id) carries the value.
  • Comprehensions — inline transformations that don’t need a named function.

First-class lambdas (|x| x + 1) and explicit function-type syntax (T -> U) are deferred. They become necessary only when compositional combinators (compose, curry, partial) enter the language; for now, comprehensions plus compute references cover the common cases.

4. Operation catalog

Each type constructor’s submodule exposes a fixed set of built-in computes. Names follow PL-mainstream conventions (Rust/Scala/Kotlin reach for these names).

Set[T] — unordered, distinct, no indexing.

Construction: Set::of(args...), Set::empty(), Set::singleton(x), Set::from_list(xs). Queries: size, is_empty, contains. Set algebra: union, intersect, difference, symmetric_difference. Subset relations: is_subset_of, is_superset_of, is_disjoint_from. Higher-order: map, filter, any, all, find, count, fold, reduce.

List[T] — ordered, allows duplicates, indexable.

Construction: List::of, List::empty, List::singleton, List::from_set (deterministic ordering). Queries: size, is_empty, contains, index_of. Access: at, head, tail, first, last. Manipulation (return new list): append, prepend, insert_at, remove_at, replace, reverse, sort, sort_by, zip, concat, flatten. Slicing: slice, take, drop, take_while, drop_while. Higher-order: same as Set plus flat_map, scan.

Map[K, V] — key-value, K must be orderable.

Construction: Map::of((k, v), ...), Map::empty. Queries: size, is_empty, contains_key, contains_value. Access: get, get_or, keys, values, entries. Manipulation: put, remove, merge, map_values, filter_keys, filter_values.

Optional[T] — 0-or-1 value.

Construction: Some(x), None(). Queries: is_some, is_none. Access: unwrap_or, unwrap_or_else. Mapping: map, flat_map, filter, and, or.

Range[T] (in std::math) — over any orderable type.

Construction: Range::new (half-open), Range::inclusive, Range::from, Range::to. Queries: contains, start, end, is_empty. Iteration: usable in comprehensions over numeric/date ranges.

5. Tier classification per operation

Every operation carries an explicit tier per RFD-0004’s seven-tier ladder. Categories:

TierOperations
tier:structuralis_some, is_none, Range::contains / start / end / is_empty
tier:closuresize, is_empty, contains, union, intersect, difference, subset checks, indexing, slicing, head/tail, append/prepend, list manipulation without HOFs, get / keys / values
tier:expressivefind, any, all, count when predicate is non-trivial
propagatesmap, filter, flat_map, fold, reduce, scan, sort_by, predicate-taking ops — tier(op) = max(tier:closure, tier(f))

The propagation rule is consistent with how recognized-shape tier reduction already works for rule bodies. A modeler writing xs.map(complex_compute) where complex_compute is tier:fol gets a tier:fol-classified result and must either accept the tier or refactor.

6. Graduated-context admission

Following RFD-0005, different evaluation contexts admit different operation sets:

ContextMethod callsComprehensionsCompute references
pub compute bodyall opsyesyes
pub mutation do { }all opsyesyes
pub derive bodytier:closure onlyyestier:closure only
query bodytier:closure onlyyestier:closure only
Refinement where { }rejected (OE2408)rejectedrejected
test blockall opsyesyes

Rule-body tier violations fire OE2408 HOFInRuleBodyTierViolation. Refinement-body violations route through the existing refinement-restriction code (ch02-06).

7. Functional semantics; mutation is rebuild-and-assign

All collection operations are pure-functional: they return new collections and never mutate in place. xs.append(x) returns a new List[T] with x appended; xs is unchanged.

In pub mutation do { } bodies, the idiom is rebuild-and-assign:

do { l.parents = l.parents.append(parent) }

True in-place mutators (l.parents.insert!(x)) are deferred. Functional semantics keep the kernel’s bitemporal event-log model clean: every collection-valued property change is a new GroundAssertion, not an in-place edit.

8. Field cardinality vs collection types

Three field shapes carry related but distinct readings:

  • field: List[T] — a single collection-valued property whose value is a List object.
  • field: [T; bound] — a binary relation pattern with cardinality bound (default Set semantics).
  • field: [T; bound, ordered] — same relation pattern but ordered (List semantics).

The cardinality bounds >= N / <= N / == N / >= N, <= M apply only to relation-shaped declarations. Collection-typed properties carry their cardinality through the collection’s runtime size; explicit bounds use where-clause refinement predicates over .size().

[T; <=1] is admitted but triggers OW2402 suggesting T?. The semantics are equivalent at the data layer; the readings differ.

9. Diagnostics

Ten new codes register in oxc-protocol::core::codes:

CodeSeverityTrigger
OE2401ErrorType-constructor arity mismatch (Set[T, U], Map[T]).
OE2402ErrorUnknown collection method (xs.bogus()). Hint lists valid methods on the receiver’s type.
OE2403ErrorCollection element-type mismatch (List[Nat].append("foo")).
OE2404ErrorIndex expression’s type isn’t Nat.
OE2405ErrorSlice bounds invalid (xs[5..2], compile-time when literal).
OE2406ErrorUnordered collection indexed (s[i] where s: Set[T]).
OE2407ErrorCompute reference signature mismatch.
OE2408ErrorHOF in rule body violates context tier ceiling.
OW2401Warning.unwrap() without fallback; suggests unwrap_or or match.
OW2402Warning[T; <=1] field; suggests T?.

10. CLI surface

No new CLI commands. ox check, ox build, ox test, ox doc all interpret the new surface through existing dispatch. oxc emit core-ir shows the desugared form for any expression with collection operators.

Rationale

Closed type-constructor set preserves RFD-0009. Allowing user-declared parametric concepts would commit Argon to a position about how foundational ontologies handle parametric type families — exactly the foundational coupling RFD-0002 forbids. A closed set of substrate primitives keeps the door shut without sacrificing user productivity; the everyday collection-modeling cases fit inside Set / List / Map / Optional / Range.

UFCS dispatch keeps the substrate small. Method-call syntax is a sugar; the desugar produces an ordinary Expr::Call against a qualified path. No method tables, no trait-resolution machinery, no runtime dispatch overhead. Receiver type inference at elaboration time picks the right submodule. The book reads naturally (xs.filter(p).map(f).count()); the IR stays uniform.

Comprehensions + compute references defer first-class functions. First-class lambdas are a real feature with real costs — function-type syntax, closure capture, type inference over function values. The team’s pain is concrete: comprehensions cover the inline-transformation case; compute references cover the named-transformation case. Lambdas can land later if combinator-style composition becomes desirable; the surface for that addition stays open.

Functional semantics align with the bitemporal event log. Every collection-valued property mutation is a new GroundAssertion in the kernel’s append-only event log per RFD-0021. In-place mutation would require the kernel to materialize partial-update events, which contradicts the existing storage discipline. Rebuild-and-assign matches the storage model byte-for-byte.

Graduated-context tiering reuses existing machinery. Rule bodies are tier-classified per RFD-0004; refinement bodies admit a restricted sub-grammar per ch02-06; compute bodies admit the full surface. Collection operations slot into the existing classification: xs.size() is tier:closure; xs.find(complex_pred) propagates from the predicate. No new tier infrastructure needed.

Bracket syntax avoids the < / > ambiguity that bedevils C++. Angle-bracket generic syntax requires lookahead disambiguation (is a < b a comparison or the start of a<b>?). Argon’s bracket syntax for type parameters sidesteps the problem entirely.

Consequences

  • New module: argon/oxc/src/std_collection.rs modeled on std_math.rs. Registers the four collection type constructors plus their submodule operation surfaces. Range registers in std_math.rs (extends, doesn’t replace).
  • core_ir.rs allocates the 0x0003_* id range for type constructors plus type_ctor_from_id mirror function.
  • SymbolKind::TypeConstructor new variant; every exhaustive match across oxc updates.
  • DataValue in nous grows Set(BTreeSet<Self>), Map(BTreeMap<Self, Self>), Optional(Option<Box<Self>>), Range { ... } variants. Additive serde change; no kernel-storage migration.
  • DataValueType in nous grows Optional(Box<Self>) and Range(Box<Self>).
  • AST in oxc/src/ast.rs grows Expr::MethodCall, Expr::Index, Expr::Slice, Expr::Range, Expr::Comprehension, Expr::Membership variants.
  • CST grammar in oxc/src/cst/grammar/exprs.rs grows a postfix-chain parser handling method-call, index, and slice forms after primary expressions. in / not in join the binary-operator precedence table. Comprehensions admit inside list literals via the for keyword.
  • Tree-sitter grammar mirrors the new productions; the codegen audit gate (oxc-codegen audit) catches drift.
  • Elaborator grows elaborate_method_call and a sibling family that desugar postfix forms into qualified calls. resolve_type_expr_to_id learns Generic / Collection / OptionalCollection arms.
  • Interpreter in nous grows ~80 evaluator functions covering every op in §4. Registered as ComputeDef::Rust bodies in the existing ComputeRegistry; no new KernelExpr variants needed.
  • Tier classifier in meta/classify.rs reads per-op tier annotations from a static table built at registration time.
  • Diagnostics in oxc/src/diagnostics/codes.rs add OE2401–OE2408 + OW2401–OW2402.
  • Book chapter ch02-08-collections.md ships with the operator catalog, tier matrix, and worked example. ch02-04, ch02-05, ch02-06 and the appendices extend.
  • Example examples/collections-tour/ ships as a minimal package exercising every operator. The lease-story example gains Building with units: List[Unit], tenants: Set[Person], rent_band: Optional[Range[Money]].
  • Open follow-ups (not in this RFD):
    • Interaction with RFD-0036’s user-level generic bounds — specifically, whether <T: Bound> admits type-constructor bounds (<T: List>).
    • User-declarable parametric type constructors via a functor-module path.
    • First-class lambdas and explicit function-type syntax.
    • In-place mutation operators (!-suffixed methods) in do { } contexts.

Historical lineage

The surface borrows directly from PL mainstream:

  • UFCS as desugar — Rust’s xs.iter().map(f).collect() reads as method calls but compiles to free-function calls in trait-impl scope. Argon’s UFCS is the same shape without traits: type-directed dispatch at elaboration time.
  • Comprehensions — Python and Haskell list comprehensions; Argon’s [e for x in xs where p] reuses the aggregate-binding grammar that already exists in Argon for sum(e for v in p where q).
  • Optional/Maybe — Haskell Maybe a, Rust Option<T>, Scala Option[T]. Argon picks Rust’s vocabulary (Some / None / unwrap_or) for PL familiarity.
  • Closed type-constructor set — Erlang’s built-in list, tuple, map types are similarly closed at the language level; user code consumes them but doesn’t declare new ones. Argon’s discipline is the same.

The functional-only semantics with rebuild-and-assign for mutation mirrors Clojure’s persistent data structures (returning new collections from “mutation” operations); the bitemporal-event-log alignment (every assignment is a new GroundAssertion) is Datomic’s transactor pattern adapted to Argon’s per-tenant overlay model per RFD-0020 + RFD-0021.

11. Addenda — decisions locked during Phase 2 implementation

Four design questions were resolved during Phase 2 build-out. Each is committed here so the RFD reflects the shipped surface.

11.1 Optional under the open-world assumption — K3-faithful

Optional[T] semantics under OWA split presence from inner-value truth state:

  • is_some(o) is classical on presence. Some(_) yields true; None() yields false. The inner value’s truth status does not enter the determination — is_some(Some(Undefined)) is true.
  • unwrap_or(o, d) is classical on absence. Some(v) yields v; None() yields d. unwrap_or(Some(Undefined), d) yields the inner Undefined, not d. The fallback applies to absence, not to the inner element’s truth status.

The reason: K3 (the three-valued logic the language uses for refinement under OWA) distinguishes “no value here” (a structural absence) from “value present but its truth is unknown” (an epistemic gap on the value itself). An unwrap_or that collapsed both cases into the fallback would erase the distinction, and downstream truth-value propagation relies on the distinction surviving.

When a modeler wants “either way, substitute a default,” the idiom is to chain a separate operation that operates on the inner value’s truth state — .or(d) (deferred) or a match over the value’s truth status. This RFD does not commit to the chained-operation surface; it commits to the rule that unwrap_or does not perform that combined collapse.

Cross-link: RFD-0037 §6 — bilattice substrate and K3/FDE projection.

11.2 Set and Map equality + normalization invariant

Set[T] and Map[K, V] evaluators normalize their backing storage so equality is Vec equality:

  • Set[T] is stored sort-deduplicated by T’s canonical ordering.
  • Map[K, V] is stored sort-by-key then dedup-keys (last write wins within a single construction call).

Every evaluator that produces a Set or Map runs the normalization step before returning. == on two Set / Map values is straight Vec equality after normalization. The invariant is asserted via debug_assert! round-trips in each evaluator — produce, normalize, re-normalize, compare; the second normalization must be a no-op.

The shipped surface does not expose the underlying Vec representation, so the discipline is internal to the evaluator implementation. The user-facing reading is “two sets are equal iff they contain the same elements” and “two maps are equal iff they contain the same key-value pairs,” which is what the normalization preserves.

11.3 Comprehension binder scoping — shadow with warning

A comprehension’s binder name shadows any in-scope let binding or parameter within the comprehension body and fires OW2403 ComprehensionBinderShadowsOuter. The warning is informational; the shadow takes effect.

pub compute weird(units: List[Unit], unit: Unit) -> List[Text] =
    [unit.number for unit in units]   // OW2403 — outer `unit` shadowed

The convention matches what Datalog rule bodies already do — every body introduces fresh binders, and modeler intent typically reads as “iterate the source,” not “shadow the outer name.” Erroring on collision would be too strict (renaming a parameter to fix a comprehension is annoying); silently shadowing would be too permissive. The warning surfaces the collision so the modeler chooses.

Rename the binder to silence:

pub compute fine(units: List[Unit], unit: Unit) -> List[Text] =
    [u.number for u in units]

11.4 Range materialization — predicate-only in v1

Range[T] does not materialize an element sequence in the v1 surface. Range::contains(r, x) is predicate-shaped: it answers “is x in the interval r?” without iterating. Range values participate in slicing (xs[i..j] constructs a Range[Nat] and passes it to List::slice) and in membership tests at expression position and rule-atom position.

Range::collect(r) -> List[T] (which would materialize the element sequence for numeric / date / money ranges) is tracked as follow-up work. The v1 surface does not iterate over ranges anywhere; the future collect addition will be a pure extension because no v1 caller exists to break.

The reading commits us to: ranges are intervals first and iterators second. The interval reading covers the common rent-band, age-band, date-band cases; the iterator reading is a convenience the language can add without disturbing the interval semantics.