RFD-0039 — Collections and collection operators
Implementation status (2026-05-14)
The substrate ships in two phases. Phase 1 lands the parts that are correct, testable, and self-contained without a multi-pass type- inference layer; Phase 2 lands the elaboration desugaring and runtime evaluators that depend on receiver type inference.
| Phase | Tracks | What lands |
|---|---|---|
| Phase 1 (this RFD’s initial PR) | A, B, C, D, E, J, K, partial N | RFD ratified · grammar surface for all five RFD-0039 expression forms · std::collection::{Set,List,Map,Optional} + std::math::Range symbol registration with v1 op surface populated as Computation symbols · DataValue / DataValueType carry Set / Map / Optional / Range runtime variants · TypeExpr::Generic / Collection / OptionalCollection resolve to the matching DataValueType · OE2401, OE2402, OE2403–OE2408, OW2401–OW2402 reserved with explanations · 22-case substrate integration harness |
| Phase 2 (follow-up) | F, G, H, I, L, M | UFCS method-call desugaring with receiver type inference · per-context tier enforcement · crates/nous/src/runtime/std_collection_eval.rs Rust evaluators backing every v1 op · book chapter ch02-08-collections.md · examples/collections-tour/ end-to-end · lease-story Building/Unit/tenants extension |
Phase 2 is tracked at issue #391
(E3 epic) with the deferred-op catalog at
issue #409. Code
written today using the RFD-0039 expression surface parses cleanly and
surfaces OE2402 from the elaborator — the diagnostic carries the
workaround (qualified std::collection::List::size(xs) form) and the
tracking link.
Question
Argon today admits collection-shaped type expressions (Set[T], List[T], Map[K, V], Optional[T], [T; bound]) at parse time, and DataValueType already carries List, Set, Map at the runtime level. But the symbols those type expressions resolve against are not registered, the runtime has no operations on them, and no expression-level surface exists for indexing, slicing, membership testing, or higher-order transformations. Modelers cannot write the conditions and transformations real ontologies require.
What surface — substrate primitives, expression operators, and standard-library operations — closes the collection story without violating foundation-neutrality (RFD-0002) and the no-parametric-concepts position (RFD-0009)?
Context
Three feedback items from the May 2026 Argon Open Questions exchange are addressed here:
- #26 — Collection operations (
priority:blocker). Modelers cannot writeis_member,append,remove,union,intersectat the expression level. - #20 — Ordered collections. Today’s
[T; bound]admits cardinality constraints but doesn’t distinguish ordered from unordered semantics. - #21 — Optional syntax.
spouse: [Person; <=1]reads as a singleton-bounded set instead of an optional value.
The current state is well-scoped:
- The CST parser produces
TypeExpr::Generic { head, args }forSet[T]/List[T]/Map[K, V],TypeExpr::OptionalforT?,TypeExpr::Collectionfor[T; bound], andExpr::Listfor list literals. None of these are blocked in the grammar. phase_elaborate.rsalready computesstable_id("std::collection::Set")/List/Mapand dispatchesTypeExpr::GenerictoDataValueType::Set / List / Map— but no symbols are registered against those stable ids, so the resolution silently degrades to concept refs.DataValueTypeinnousalready hasList(Box<Self>),Set(Box<Self>),Map(Box<Self>, Box<Self>).DataValuehasList(Vec<Self>)but lacksSet,Map,Optional,Range.- Aggregate forms (
sum(expr for v in path where pred)) work in query heads and compute bodies; they are explicitly disallowed in rule bodies viaOE0403. - No expression-level method-call syntax exists today.
x.foo(bar)fails to parse (is_call_aheadonly fires for bare identifiers, not post-dot).
The remaining work is connecting the parser surface to a populated std::collection module, adding the missing expression forms (method-call, indexing, slicing, membership, comprehension), and wiring the runtime to evaluate them.
RFD-0009 forbids user-declared parametric concepts. The implication: Set, List, Map, Optional, and Range cannot be user-declared and cannot be modeled as ordinary concepts with type parameters. They must live as a closed set of substrate-level type constructors. User code consumes them; user code does not extend them. Future user-declarable parametric types live behind a functor-module path that this RFD does not commit to.
Decision
1. Five built-in type constructors
std::collection exposes four type constructors; std::math exposes a fifth:
| Constructor | Arity | Module path | ID range |
|---|---|---|---|
Set[T] | 1 | std::collection::Set | TYPE_CTOR_SET_ID = 0x0003_0000_0000_0000 |
List[T] | 1 | std::collection::List | TYPE_CTOR_LIST_ID = 0x0003_0000_0000_0001 |
Map[K, V] | 2 | std::collection::Map | TYPE_CTOR_MAP_ID = 0x0003_0000_0000_0002 |
Optional[T] | 1 | std::collection::Optional | TYPE_CTOR_OPTIONAL_ID = 0x0003_0000_0000_0003 |
Range[T] | 1 | std::math::Range | TYPE_CTOR_RANGE_ID = 0x0003_0000_0000_0004 |
The id range 0x0003_0000_0000_0000..0x0003_0000_0000_00FF is reserved for built-in type constructors, parallel to the 0x0002_* primitive-sentinel range. Type constructors are parametric and therefore have no DataValueType round-trip through primitive_sentinel_for; they resolve through a separate type_ctor_from_id function.
A new SymbolKind::TypeConstructor variant labels these symbols in the symbol table. The substrate cannot ship a new type constructor without a code change; the closed set is the price of foundation-neutrality (no metatype machinery decides what List means).
2. Surface — generic types use brackets, T? is sugar
Argon’s existing convention reserves < and > for comparison operators. Generic types therefore use brackets:
items: List[Item]
tags: Set[Text]
owners: Map[Tenant, Lease]
discount: Optional[Money]
band: Range[Money]
T? is admitted as user-facing sugar for Optional[T] and lowers identically at the AST→CoreIR boundary. [T] and [T; bound] continue to serve relation-shaped multi-valued field declarations; both lower to Set[T] by default, and an explicit ordered flag ([T; bound, ordered]) selects List[T]. [T; <=1] triggers OW2402 suggesting rewrite to T?.
3. Expression-level surface
Six new expression forms, all desugaring to ordinary calls at elaboration time:
- Method call —
xs.m(a, b)desugars to<TypeOf(xs)>::m(xs, a, b). Dispatch is at elaboration time; no traits, no runtime dispatch, no method tables. - Indexing —
xs[i]desugars to<TypeOf(xs)>::at(xs, i)returningOptional[T]. Receivers typed asSet[T]reject withOE2406. - Slicing —
xs[i..j],xs[i..],xs[..j]desugar to<TypeOf(xs)>::slice(xs, range). ReturnsList[T]. - Range literal —
i..jconstructsRange[T]::new(i, j)(half-open);i..=jconstructsRange[T]::inclusive(i, j). - Membership operator —
x in xsdesugars to<TypeOf(xs)>::contains(xs, x);x not in xsnegates. - Comprehension —
[expr for x in xs where pred]desugars toxs.filter(<pred>).map(<expr>)inline. The binding subgrammar reuses the existingAGGREGATE_BINDING+AGGREGATE_WHEREproductions.
Higher-order arguments accept two forms:
- Compute references — a bare
pub computename in an argument position is a value of compute-reference type. Elaboration validates the reference’s signature against the higher-order parameter’s expected shape; mismatch firesOE2407.CoreValue::ComputeRef(compute_id)carries the value. - Comprehensions — inline transformations that don’t need a named function.
First-class lambdas (|x| x + 1) and explicit function-type syntax (T -> U) are deferred. They become necessary only when compositional combinators (compose, curry, partial) enter the language; for now, comprehensions plus compute references cover the common cases.
4. Operation catalog
Each type constructor’s submodule exposes a fixed set of built-in computes. Names follow PL-mainstream conventions (Rust/Scala/Kotlin reach for these names).
Set[T] — unordered, distinct, no indexing.
Construction: Set::of(args...), Set::empty(), Set::singleton(x), Set::from_list(xs).
Queries: size, is_empty, contains.
Set algebra: union, intersect, difference, symmetric_difference.
Subset relations: is_subset_of, is_superset_of, is_disjoint_from.
Higher-order: map, filter, any, all, find, count, fold, reduce.
List[T] — ordered, allows duplicates, indexable.
Construction: List::of, List::empty, List::singleton, List::from_set (deterministic ordering).
Queries: size, is_empty, contains, index_of.
Access: at, head, tail, first, last.
Manipulation (return new list): append, prepend, insert_at, remove_at, replace, reverse, sort, sort_by, zip, concat, flatten.
Slicing: slice, take, drop, take_while, drop_while.
Higher-order: same as Set plus flat_map, scan.
Map[K, V] — key-value, K must be orderable.
Construction: Map::of((k, v), ...), Map::empty.
Queries: size, is_empty, contains_key, contains_value.
Access: get, get_or, keys, values, entries.
Manipulation: put, remove, merge, map_values, filter_keys, filter_values.
Optional[T] — 0-or-1 value.
Construction: Some(x), None().
Queries: is_some, is_none.
Access: unwrap_or, unwrap_or_else.
Mapping: map, flat_map, filter, and, or.
Range[T] (in std::math) — over any orderable type.
Construction: Range::new (half-open), Range::inclusive, Range::from, Range::to.
Queries: contains, start, end, is_empty.
Iteration: usable in comprehensions over numeric/date ranges.
5. Tier classification per operation
Every operation carries an explicit tier per RFD-0004’s seven-tier ladder. Categories:
| Tier | Operations |
|---|---|
tier:structural | is_some, is_none, Range::contains / start / end / is_empty |
tier:closure | size, is_empty, contains, union, intersect, difference, subset checks, indexing, slicing, head/tail, append/prepend, list manipulation without HOFs, get / keys / values |
tier:expressive | find, any, all, count when predicate is non-trivial |
| propagates | map, filter, flat_map, fold, reduce, scan, sort_by, predicate-taking ops — tier(op) = max(tier:closure, tier(f)) |
The propagation rule is consistent with how recognized-shape tier reduction already works for rule bodies. A modeler writing xs.map(complex_compute) where complex_compute is tier:fol gets a tier:fol-classified result and must either accept the tier or refactor.
6. Graduated-context admission
Following RFD-0005, different evaluation contexts admit different operation sets:
| Context | Method calls | Comprehensions | Compute references |
|---|---|---|---|
pub compute body | all ops | yes | yes |
pub mutation do { } | all ops | yes | yes |
pub derive body | tier:closure only | yes | tier:closure only |
query body | tier:closure only | yes | tier:closure only |
Refinement where { } | rejected (OE2408) | rejected | rejected |
test block | all ops | yes | yes |
Rule-body tier violations fire OE2408 HOFInRuleBodyTierViolation. Refinement-body violations route through the existing refinement-restriction code (ch02-06).
7. Functional semantics; mutation is rebuild-and-assign
All collection operations are pure-functional: they return new collections and never mutate in place. xs.append(x) returns a new List[T] with x appended; xs is unchanged.
In pub mutation do { } bodies, the idiom is rebuild-and-assign:
do { l.parents = l.parents.append(parent) }
True in-place mutators (l.parents.insert!(x)) are deferred. Functional semantics keep the kernel’s bitemporal event-log model clean: every collection-valued property change is a new GroundAssertion, not an in-place edit.
8. Field cardinality vs collection types
Three field shapes carry related but distinct readings:
field: List[T]— a single collection-valued property whose value is a List object.field: [T; bound]— a binary relation pattern with cardinality bound (default Set semantics).field: [T; bound, ordered]— same relation pattern but ordered (List semantics).
The cardinality bounds >= N / <= N / == N / >= N, <= M apply only to relation-shaped declarations. Collection-typed properties carry their cardinality through the collection’s runtime size; explicit bounds use where-clause refinement predicates over .size().
[T; <=1] is admitted but triggers OW2402 suggesting T?. The semantics are equivalent at the data layer; the readings differ.
9. Diagnostics
Ten new codes register in oxc-protocol::core::codes:
| Code | Severity | Trigger |
|---|---|---|
OE2401 | Error | Type-constructor arity mismatch (Set[T, U], Map[T]). |
OE2402 | Error | Unknown collection method (xs.bogus()). Hint lists valid methods on the receiver’s type. |
OE2403 | Error | Collection element-type mismatch (List[Nat].append("foo")). |
OE2404 | Error | Index expression’s type isn’t Nat. |
OE2405 | Error | Slice bounds invalid (xs[5..2], compile-time when literal). |
OE2406 | Error | Unordered collection indexed (s[i] where s: Set[T]). |
OE2407 | Error | Compute reference signature mismatch. |
OE2408 | Error | HOF in rule body violates context tier ceiling. |
OW2401 | Warning | .unwrap() without fallback; suggests unwrap_or or match. |
OW2402 | Warning | [T; <=1] field; suggests T?. |
10. CLI surface
No new CLI commands. ox check, ox build, ox test, ox doc all interpret the new surface through existing dispatch. oxc emit core-ir shows the desugared form for any expression with collection operators.
Rationale
Closed type-constructor set preserves RFD-0009. Allowing user-declared parametric concepts would commit Argon to a position about how foundational ontologies handle parametric type families — exactly the foundational coupling RFD-0002 forbids. A closed set of substrate primitives keeps the door shut without sacrificing user productivity; the everyday collection-modeling cases fit inside Set / List / Map / Optional / Range.
UFCS dispatch keeps the substrate small. Method-call syntax is a sugar; the desugar produces an ordinary Expr::Call against a qualified path. No method tables, no trait-resolution machinery, no runtime dispatch overhead. Receiver type inference at elaboration time picks the right submodule. The book reads naturally (xs.filter(p).map(f).count()); the IR stays uniform.
Comprehensions + compute references defer first-class functions. First-class lambdas are a real feature with real costs — function-type syntax, closure capture, type inference over function values. The team’s pain is concrete: comprehensions cover the inline-transformation case; compute references cover the named-transformation case. Lambdas can land later if combinator-style composition becomes desirable; the surface for that addition stays open.
Functional semantics align with the bitemporal event log. Every collection-valued property mutation is a new GroundAssertion in the kernel’s append-only event log per RFD-0021. In-place mutation would require the kernel to materialize partial-update events, which contradicts the existing storage discipline. Rebuild-and-assign matches the storage model byte-for-byte.
Graduated-context tiering reuses existing machinery. Rule bodies are tier-classified per RFD-0004; refinement bodies admit a restricted sub-grammar per ch02-06; compute bodies admit the full surface. Collection operations slot into the existing classification: xs.size() is tier:closure; xs.find(complex_pred) propagates from the predicate. No new tier infrastructure needed.
Bracket syntax avoids the < / > ambiguity that bedevils C++. Angle-bracket generic syntax requires lookahead disambiguation (is a < b a comparison or the start of a<b>?). Argon’s bracket syntax for type parameters sidesteps the problem entirely.
Consequences
- New module:
argon/oxc/src/std_collection.rsmodeled onstd_math.rs. Registers the four collection type constructors plus their submodule operation surfaces. Range registers instd_math.rs(extends, doesn’t replace). core_ir.rsallocates the0x0003_*id range for type constructors plustype_ctor_from_idmirror function.SymbolKind::TypeConstructornew variant; every exhaustive match acrossoxcupdates.DataValueinnousgrowsSet(BTreeSet<Self>),Map(BTreeMap<Self, Self>),Optional(Option<Box<Self>>),Range { ... }variants. Additive serde change; no kernel-storage migration.DataValueTypeinnousgrowsOptional(Box<Self>)andRange(Box<Self>).- AST in
oxc/src/ast.rsgrowsExpr::MethodCall,Expr::Index,Expr::Slice,Expr::Range,Expr::Comprehension,Expr::Membershipvariants. - CST grammar in
oxc/src/cst/grammar/exprs.rsgrows a postfix-chain parser handling method-call, index, and slice forms after primary expressions.in/not injoin the binary-operator precedence table. Comprehensions admit inside list literals via theforkeyword. - Tree-sitter grammar mirrors the new productions; the codegen audit gate (
oxc-codegen audit) catches drift. - Elaborator grows
elaborate_method_calland a sibling family that desugar postfix forms into qualified calls.resolve_type_expr_to_idlearnsGeneric/Collection/OptionalCollectionarms. - Interpreter in
nousgrows ~80 evaluator functions covering every op in §4. Registered asComputeDef::Rustbodies in the existingComputeRegistry; no newKernelExprvariants needed. - Tier classifier in
meta/classify.rsreads per-op tier annotations from a static table built at registration time. - Diagnostics in
oxc/src/diagnostics/codes.rsaddOE2401–OE2408+OW2401–OW2402. - Book chapter
ch02-08-collections.mdships with the operator catalog, tier matrix, and worked example.ch02-04,ch02-05,ch02-06and the appendices extend. - Example
examples/collections-tour/ships as a minimal package exercising every operator. The lease-story example gainsBuildingwithunits: List[Unit],tenants: Set[Person],rent_band: Optional[Range[Money]]. - Open follow-ups (not in this RFD):
- Interaction with RFD-0036’s user-level generic bounds — specifically, whether
<T: Bound>admits type-constructor bounds (<T: List>). - User-declarable parametric type constructors via a functor-module path.
- First-class lambdas and explicit function-type syntax.
- In-place mutation operators (
!-suffixed methods) indo { }contexts.
- Interaction with RFD-0036’s user-level generic bounds — specifically, whether
Historical lineage
The surface borrows directly from PL mainstream:
- UFCS as desugar — Rust’s
xs.iter().map(f).collect()reads as method calls but compiles to free-function calls in trait-impl scope. Argon’s UFCS is the same shape without traits: type-directed dispatch at elaboration time. - Comprehensions — Python and Haskell list comprehensions; Argon’s
[e for x in xs where p]reuses the aggregate-binding grammar that already exists in Argon forsum(e for v in p where q). - Optional/Maybe — Haskell
Maybe a, RustOption<T>, ScalaOption[T]. Argon picks Rust’s vocabulary (Some/None/unwrap_or) for PL familiarity. - Closed type-constructor set — Erlang’s built-in
list,tuple,maptypes are similarly closed at the language level; user code consumes them but doesn’t declare new ones. Argon’s discipline is the same.
The functional-only semantics with rebuild-and-assign for mutation mirrors Clojure’s persistent data structures (returning new collections from “mutation” operations); the bitemporal-event-log alignment (every assignment is a new GroundAssertion) is Datomic’s transactor pattern adapted to Argon’s per-tenant overlay model per RFD-0020 + RFD-0021.
11. Addenda — decisions locked during Phase 2 implementation
Four design questions were resolved during Phase 2 build-out. Each is committed here so the RFD reflects the shipped surface.
11.1 Optional under the open-world assumption — K3-faithful
Optional[T] semantics under OWA split presence from inner-value truth state:
is_some(o)is classical on presence.Some(_)yieldstrue;None()yieldsfalse. The inner value’s truth status does not enter the determination —is_some(Some(Undefined))istrue.unwrap_or(o, d)is classical on absence.Some(v)yieldsv;None()yieldsd.unwrap_or(Some(Undefined), d)yields the innerUndefined, notd. The fallback applies to absence, not to the inner element’s truth status.
The reason: K3 (the three-valued logic the language uses for refinement under OWA) distinguishes “no value here” (a structural absence) from “value present but its truth is unknown” (an epistemic gap on the value itself). An unwrap_or that collapsed both cases into the fallback would erase the distinction, and downstream truth-value propagation relies on the distinction surviving.
When a modeler wants “either way, substitute a default,” the idiom is to chain a separate operation that operates on the inner value’s truth state — .or(d) (deferred) or a match over the value’s truth status. This RFD does not commit to the chained-operation surface; it commits to the rule that unwrap_or does not perform that combined collapse.
Cross-link: RFD-0037 §6 — bilattice substrate and K3/FDE projection.
11.2 Set and Map equality + normalization invariant
Set[T] and Map[K, V] evaluators normalize their backing storage so equality is Vec equality:
Set[T]is stored sort-deduplicated byT’s canonical ordering.Map[K, V]is stored sort-by-key then dedup-keys (last write wins within a single construction call).
Every evaluator that produces a Set or Map runs the normalization step before returning. == on two Set / Map values is straight Vec equality after normalization. The invariant is asserted via debug_assert! round-trips in each evaluator — produce, normalize, re-normalize, compare; the second normalization must be a no-op.
The shipped surface does not expose the underlying Vec representation, so the discipline is internal to the evaluator implementation. The user-facing reading is “two sets are equal iff they contain the same elements” and “two maps are equal iff they contain the same key-value pairs,” which is what the normalization preserves.
11.3 Comprehension binder scoping — shadow with warning
A comprehension’s binder name shadows any in-scope let binding or parameter within the comprehension body and fires OW2403 ComprehensionBinderShadowsOuter. The warning is informational; the shadow takes effect.
pub compute weird(units: List[Unit], unit: Unit) -> List[Text] =
[unit.number for unit in units] // OW2403 — outer `unit` shadowed
The convention matches what Datalog rule bodies already do — every body introduces fresh binders, and modeler intent typically reads as “iterate the source,” not “shadow the outer name.” Erroring on collision would be too strict (renaming a parameter to fix a comprehension is annoying); silently shadowing would be too permissive. The warning surfaces the collision so the modeler chooses.
Rename the binder to silence:
pub compute fine(units: List[Unit], unit: Unit) -> List[Text] =
[u.number for u in units]
11.4 Range materialization — predicate-only in v1
Range[T] does not materialize an element sequence in the v1 surface. Range::contains(r, x) is predicate-shaped: it answers “is x in the interval r?” without iterating. Range values participate in slicing (xs[i..j] constructs a Range[Nat] and passes it to List::slice) and in membership tests at expression position and rule-atom position.
Range::collect(r) -> List[T] (which would materialize the element sequence for numeric / date / money ranges) is tracked as follow-up work. The v1 surface does not iterate over ranges anywhere; the future collect addition will be a pure extension because no v1 caller exists to break.
The reading commits us to: ranges are intervals first and iterators second. The interval reading covers the common rent-band, age-band, date-band cases; the iterator reading is a convenience the language can add without disturbing the interval semantics.