core.collections — Lists, Maps, Sets, Deques
Semantic-honest data structures. You talk to the protocol; the compiler chooses the implementation.
Module status
Each core.collections.* module carries an explicit conformance status
so you know what you can rely on today versus what is still in flight.
The status is the truth-table over the module's API surface as exercised
by core-tests/collections/<module>/ under both verum test --interp
(Tier 0 VBC interpreter) and verum test --aot (Tier 2 LLVM AOT).
| Status | Meaning |
|---|---|
| stable | Every public method is conformance-tested. Algebraic laws are pinned by exhaustive or large-domain property tests. Cross-stdlib integration is verified. Interpreter and AOT agree on every test. Safe to depend on in production. |
| partial | Subset of the public API is conformance-tested and stable. The rest is exercised in regression_test.vr via @ignored tests pinning the specific defects that block coverage. The non-ignored API surface is safe; everything else is documented per-module under "Open defects". |
| regression-only | Module is gated by upstream stdlib / language-level defects. Public-API tests do not pass yet — only @ignored regressions exist to lock the bug shapes. Avoid in production until promoted. |
| undocumented | Documentation in this reference is authoritative, but the module has not yet been routed through the core-tests/ conformance suite. The current page is a best-effort snapshot of the source; it may drift from runtime behaviour. |
| Module | Status | Conformance suite |
|---|---|---|
list.vr | undocumented | — |
map.vr | undocumented | — |
set.vr | undocumented | — |
multiset.vr | undocumented | — |
deque.vr | undocumented | — |
heap.vr | undocumented | — |
btree.vr | undocumented | — |
slice.vr | undocumented | — |
lru.vr | undocumented | — |
ttl_cache.vr | undocumented | — |
bloom.vr | undocumented | — |
hyperloglog.vr | undocumented | — |
count_min.vr | undocumented | — |
reservoir.vr | partial | core-tests/collections/reservoir — 14 unit + 7 property + 4 integration + 4 pinned regressions (CSPRNG-gated replacement phase) |
consistent_hash.vr | undocumented | — |
adjacency_list.vr | undocumented | — |
alias_sampler.vr | undocumented | — |
toposort.vr | regression-only | core-tests/collections/toposort — 4 narrow unit tests + 6 pinned regressions (Map.contains_key/get + Text.from defects gate the algorithm itself) |
trie.vr | undocumented | — |
union_find.vr | partial | core-tests/collections/union_find — 31 unit + 14 property + 7 integration + 11 pinned regressions |
Core data structures
| File | What's in it |
|---|---|
list.vr | List<T> + adapters (ListIter, Drain, Chunks, Windows, …) |
map.vr | Map<K,V> + MapEntry<K,V>, OccupiedEntry, VacantEntry, MapIter, Keys, Values, Drain |
set.vr | Set<T> + SetIter, SetDrain |
multiset.vr | Multiset<T> (hash bag with strictly-positive multiplicities) + MultisetIter, MultisetDistinctIter |
deque.vr | Deque<T> + DequeIter, DequeDrain |
heap.vr | BinaryHeap<T>, MinHeap<T>, HeapDrainSorted, Reverse<T> |
btree.vr | BTreeMap<K,V>, BTreeSet<T>, BTreeEntry, range iterators |
slice.vr | slice utilities — slice_iter, chunks, windows, split_at |
Caches + probabilistic sketches
| File | What's in it |
|---|---|
lru.vr | LruCache<K,V> — pure capacity-bounded LRU |
ttl_cache.vr | TtlCache<K,V> — LRU + per-entry TTL |
bloom.vr | BloomFilter — "is X present?" probabilistic set |
hyperloglog.vr | HyperLogLog — "how many distinct?" cardinality |
count_min.vr | CountMinSketch — "how often was X seen?" frequency |
reservoir.vr | Reservoir<T> — Algorithm R uniform streaming sample |
consistent_hash.vr | ConsistentHashRing — Ketama-compatible distribution |
Every core data structure implements Iterator, IntoIterator,
Clone, Debug, Display, Eq, Hash, Default. Hash is always
consistent with Eq under the structure's equality semantics — for
order-independent collections (Set, Map, BinaryHeap) hashing
is deliberately order-independent (XOR-fold over element hashes for
Set/Map, sort-then-fold for BinaryHeap) so the
a == b → hash(a) == hash(b) contract holds across distinct internal
arrangements of the same multiset.
Caches and sketches carry stats counters (hits/misses/evicted)
instead of the Iterator shape since they're lossy or ordered by
workload rather than enumeration.
List<T> — dynamic array
O(1) amortised push/pop at the end; O(1) random access; O(n) insert- in-middle.
public type List<T> is { ptr: &unsafe T, len: Int, cap: Int };
Construction
List.new() -> List<T>
List.with_capacity(capacity: Int) -> List<T>
List.from_slice(slice: &[T]) -> List<T> // T: Clone
List.from(iter) // from any Iterator<Item=T>
let xs = list![1, 2, 3]; // macro — array of items
let ys = list![0; 10]; // macro — 10 copies of 0
Capacity & size
xs.len() -> Int
xs.is_empty() -> Bool
xs.capacity() -> Int
xs.reserve(additional) xs.reserve_exact(additional)
xs.shrink_to_fit() xs.shrink_to(min_capacity)
Access
xs[i] // panics on OOB
xs.get(i) -> Maybe<&T>
xs.get_mut(i) -> Maybe<&mut T>
xs.first() / xs.last() -> Maybe<&T>
xs.first_mut() / xs.last_mut() -> Maybe<&mut T>
Mutation
xs.push(value)
xs.pop() -> Maybe<T>
xs.insert(index, value)
xs.remove(index) -> T
xs.swap_remove(index) -> T // O(1); destroys order
xs.swap(i, j)
xs.clear()
xs.truncate(len)
xs.extend(other) // extend from an iterator
xs.retain(|x| pred(x)) // keep only matching elements
xs.dedup() // remove consecutive duplicates (T: PartialEq)
xs.dedup_by(|a, b| same(a, b))
xs.dedup_by_key(|x| key(x))
xs.sort() // T: Ord
xs.sort_by(|a, b| a.cmp(b))
xs.sort_by_key(|x| key(x))
xs.sort_unstable() // T: Ord, faster, not stable
xs.reverse()
xs.fill(value) // T: Clone
xs.fill_with(|| make())
xs.resize(new_len, value) // T: Clone
Slicing
xs[a..b] // panics on OOB range
xs.slice(a, b) xs.slice_mut(a, b)
xs.split_at(i) -> (&[T], &[T])
xs.split_first() -> Maybe<(&T, &[T])>
xs.split_last() -> Maybe<(&T, &[T])>
Iteration
xs.iter() // Iterator<&T>
xs.iter_mut() // Iterator<&mut T>
xs.into_iter() // consumes xs
xs.drain(a..b) // removes and yields a range
xs.chunks(n) // non-overlapping windows; ChunksIter
xs.chunks_exact(n) // exact-sized chunks + remainder
xs.windows(n) // sliding window of n; WindowsIter
xs.enumerate_iter() // convenience (.iter().enumerate() also works)
Searching
xs.contains(&value) -> Bool // T: PartialEq
xs.position(|x| pred(x)) -> Maybe<Int>
xs.rposition(|x| pred(x)) -> Maybe<Int>
xs.binary_search(&value) -> Result<Int, Int> // T: Ord
xs.binary_search_by(|probe| cmp)
xs.binary_search_by_key(&target, |x| key(x))
xs.partition_point(|x| pred(x)) -> Int
Conversion
xs.as_slice() -> &[T]
xs.as_mut_slice() -> &mut [T]
xs.to_vec() // alias for clone-into-new-list
Example
let mut words = list!["hello", "world", "verum"];
words.sort();
words.dedup();
for (i, w) in words.iter().enumerate() {
print(f"{i}: {w}");
}
let counts: Map<Int, Int> = words.iter()
.map(|w| w.len())
.fold(Map.new(), |mut m, len| {
*m.entry(len).or_insert(0) += 1;
m
});
Pitfall — swap_remove destroys order
swap_remove(i) is O(1) because it swaps with the last element. If
order matters, use remove(i) (O(n)).
Map<K, V> — hash map
Swiss-table-style flat hash map. K: Hash + Eq.
Construction
Map.new() -> Map<K, V>
Map.with_capacity(capacity) -> Map<K, V>
Map.from_iter(iter: Iter<(K,V)>)
let m = map!["a" => 1, "b" => 2];
Size
m.len() m.is_empty() m.capacity()
m.reserve(additional) m.shrink_to_fit()
Access
m.get(&key) -> Maybe<&V>
m.get_mut(&key) -> Maybe<&mut V>
m.get_key_value(&key) -> Maybe<(&K, &V)>
m.contains_key(&key) -> Bool
m[&key] // panics on missing
Mutation
m.insert(key, value) -> Maybe<V> // returns old value if any
m.remove(&key) -> Maybe<V>
m.remove_entry(&key) -> Maybe<(K, V)>
m.clear()
m.retain(|k, v| pred(k, v))
m.extend(iter) // iter yields (K, V)
Entry API — insert-or-update without double lookup
type MapEntry<K: Hash + Eq, V> is
| Occupied(OccupiedEntry<K, V>)
| Vacant(VacantEntry<K, V>);
m.entry(key) -> MapEntry<K, V>
entry.or_insert(default) -> &mut V
entry.or_insert_with(|| compute()) -> &mut V
entry.or_insert_with_key(|k| compute(k)) -> &mut V
entry.or_default() -> &mut V // V: Default
entry.and_modify(|v| mutate(v)) -> MapEntry<K, V>
entry.key() -> &K
// On Occupied only:
occ.get() / occ.get_mut() / occ.into_mut()
occ.insert(value) -> V // returns old
occ.remove() -> V
occ.remove_entry() -> (K, V)
Iteration
m.iter() // Iterator<(&K, &V)>
m.iter_mut() // Iterator<(&K, &mut V)>
m.into_iter() // consumes m
m.keys() // Iterator<&K>
m.values() // Iterator<&V>
m.values_mut() // Iterator<&mut V>
m.drain() // consuming drain — Iterator<(K, V)>
Examples
// count words
let mut freq: Map<Text, Int> = Map.new();
for w in text.split_whitespace() {
*freq.entry(w.to_string()).or_insert(0) += 1;
}
// group by key
let mut groups: Map<Int, List<User>> = Map.new();
for u in users {
groups.entry(u.dept).or_insert_with(List.new).push(u);
}
// atomic upsert
match map.entry(key) {
MapEntry.Occupied(mut e) => *e.get_mut() += 1,
MapEntry.Vacant(e) => { e.insert(1); }
}
Pitfall — mutating while iterating
Inserting or removing entries while iterating with iter/iter_mut
is undefined. Collect the changes and apply after, or use retain.
Set<T> — hash set
T: Hash + Eq.
Construction
Set.new() Set.with_capacity(cap) Set.from_iter(iter)
let s = set![1, 2, 3];
Size & access
s.len() s.is_empty() s.capacity()
s.contains(&v) -> Bool
s.get(&v) -> Maybe<&T> // returns the stored value
Mutation
s.insert(value) -> Bool // true if it was new
s.remove(&value) -> Bool
s.take(&value) -> Maybe<T> // remove + return the stored value
s.replace(value) -> Maybe<T> // replace if present
s.clear()
s.retain(|v| pred(v))
Set algebra
a.union(&b) // Iterator<&T>
a.intersection(&b) // Iterator<&T>
a.difference(&b) // Iterator<&T>
a.symmetric_difference(&b) // Iterator<&T>
a.is_disjoint(&b) a.is_subset(&b) a.is_superset(&b)
Example
let a: Set<Int> = set![1, 2, 3, 4];
let b: Set<Int> = set![3, 4, 5, 6];
let union: Set<Int> = a.union(&b).copied().collect(); // {1..6}
let intersect: Set<Int> = a.intersection(&b).copied().collect(); // {3,4}
let diff: Set<Int> = a.difference(&b).copied().collect(); // {1,2}
Protocol implementations
implement<T: Hash + Eq + Clone> Clone for Set<T>;
implement<T: Hash + Eq> Default for Set<T>;
implement<T: Hash + Eq + Debug> Eq for Set<T>; // delegates to inner Map
implement<T: Hash + Eq> Hash for Set<T>; // order-independent
implement<T: Hash + Eq + Debug> Debug for Set<T>; // {a, b, c}
implement<T: Hash + Eq + Display> Display for Set<T>; // {a, b, c}
The Eq and Hash impls are order-independent — two sets with the
same elements but different insertion order compare equal and hash
identically. Implementation delegates to the underlying
Map<T, ()> so set semantics fall out for free from map's
order-independent invariants.
Multiset<T> — hash bag with multiplicities
T: Hash + Eq. Generalisation of Set<T>: every element carries
an integer multiplicity ≥ 1. Insert a value already present →
multiplicity increments; remove → decrement; reaching zero evicts
the entry.
Backed by Map<T, Int> exactly the way Set<T> is backed by
Map<T, ()> — same Robin-Hood hashing, same amortised O(1) per-
element ops.
Two distinct sizes
Multisets have two natural "size" notions, both exposed:
| Method | Returns | Cost |
|---|---|---|
distinct_len() | number of unique elements (= |support|) | O(1) |
cardinality() | sum of multiplicities (Σ counts) | O(1) — cached |
The deprecated len() is not provided to force the choice —
Set.len() has one answer, Multiset.len() would have two; making
the caller pick prevents silent wrong-size bugs.
Construction
Multiset.new() Multiset.with_capacity(cap)
let m = Multiset.from([1, 2, 2, 3, 3, 3]);
let m = Multiset.from_counts([(1, 1), (2, 2), (3, 3)]); // (element, multiplicity)
Access
m.distinct_len() -> Int // |support|
m.cardinality() -> Int // Σ multiplicities
m.is_empty() // cardinality == 0
m.count(&v) -> Int // multiplicity of v (0 if absent)
m.contains(&v) -> Bool // count(v) > 0
Mutation
m.insert(value) -> Int // returns new multiplicity
m.insert_n(value, n) -> Int // increment by n; n ≤ 0 is a no-op
m.remove(&value) -> Int // decrement by 1; saturates at 0
m.remove_n(&value, n) -> Int // decrement by n; saturates
m.remove_all(&value) -> Int // evict entirely; returns evicted count
m.clear()
m.retain(|v, count| pred(v, count)) // keeps `cardinality` consistent
Fallible counterparts (try_insert, try_insert_n, try_reserve)
return Result<_, AllocError> instead of panicking on growth.
Algebraic operations
Standard multiset algebra (Knuth TAOCP §4.6.3) over (ℕ, +, max, min):
| Op | Per-element semantics | Notation |
|---|---|---|
union(&other) | max(count_a, count_b) | A ∪ B |
intersection(&other) | min(count_a, count_b) | A ∩ B |
sum(&other) | count_a + count_b | A ⊎ B (multiset disjoint sum) |
difference(&other) | max(0, count_a − count_b) | A − B |
is_subset / is_superset / is_disjoint follow the multiset definitions
(every-element multiplicity ≤, ≥, share-no-support respectively).
Iteration
m.iter() // Iterator<(&T, Int)> — distinct + multiplicity
m.distinct_iter() // Iterator<&T> — distinct only
m.to_list() -> List<T> // expand by multiplicity (cardinality entries)
m.to_count_list() -> List<(T, Int)> // (element, count) without expansion
Statistics
m.mode() -> Maybe<&T> // element with largest multiplicity (ties broken arbitrarily)
When to choose Multiset vs Set
- Set — identity / membership only; "is x present?".
- Multiset — frequency counting; "how many times has x occurred?". Statistical aggregations, deduplication-with-counts, MVCC bag semantics, simplicial-multiset structures, fractal-holon sub-holon multisets.
Protocol implementations
implement<T: Hash + Eq + Clone> Clone for Multiset<T>;
implement<T: Hash + Eq> Eq for Multiset<T>; // every element same multiplicity
implement<T: Hash + Eq + Debug> Debug for Multiset<T>; // Multiset {a×2, b×3}
Multiset equality is structural — same support, same per-element multiplicity. Two multisets with the same elements but different insertion order compare equal.
Deque<T> — double-ended queue
Ring buffer. O(1) push/pop at both ends.
Deque.new() Deque.with_capacity(cap)
let q = deque![1, 2, 3];
q.len() q.is_empty() q.capacity()
q.push_front(v) q.push_back(v)
q.pop_front() -> Maybe<T>
q.pop_back() -> Maybe<T>
q.front() / q.back() -> Maybe<&T>
q.front_mut() / q.back_mut()
q.get(i) / q.get_mut(i) // 0 = front
q[i] // panics on OOB
q.iter() / q.iter_mut() / q.into_iter() / q.drain(range)
q.rotate_left(n) q.rotate_right(n)
q.clear() q.retain(|x| pred(x)) q.extend(iter)
Use as FIFO (push_back + pop_front) or LIFO (push_back + pop_back).
BinaryHeap<T> / MinHeap<T> — priority queue
Max-heap: BinaryHeap<T>. Min-heap: MinHeap<T>. T: Ord.
let mut pq: BinaryHeap<Int> = heap![3, 1, 4, 1, 5, 9, 2, 6];
pq.push(7);
pq.peek(); // Maybe.Some(&9)
let top = pq.pop(); // Maybe.Some(9)
let all_sorted: List<Int> = pq.into_sorted_vec(); // max-heap -> descending
BinaryHeap.new() BinaryHeap.with_capacity(cap)
BinaryHeap.from(list) // heapify in O(n)
h.len() h.is_empty() h.capacity()
h.push(v) h.pop() -> Maybe<T>
h.peek() -> Maybe<&T>
h.peek_mut() -> Maybe<PeekMut<T>>
h.clear()
h.drain() // unordered drain; HeapDrain
h.drain_sorted() // yields in heap order; HeapDrainSorted
h.into_sorted_vec() -> List<T>
h.iter() // unordered
Min-heap via wrapping
For "min by custom key", wrap values in Reverse<T>:
let mut h: BinaryHeap<Reverse<(Int, Text)>> = BinaryHeap.new();
h.push(Reverse((3, "three".to_string())));
h.push(Reverse((1, "one".to_string())));
let Reverse((k, v)) = h.pop().unwrap(); // (1, "one")
Protocol implementations
implement<T: Ord> IntoIterator for BinaryHeap<T>;
implement<T: Ord + Eq> Eq for BinaryHeap<T>; // sorts then compares
implement<T: Ord + Clone> Clone for BinaryHeap<T>;
implement<T: Ord> Default for BinaryHeap<T>;
implement<T: Ord + Debug> Debug for BinaryHeap<T>;
implement<T: Ord + Display> Display for BinaryHeap<T>;
implement<T: Ord + Hash + Clone> Hash for BinaryHeap<T>; // sorts then hashes
implement<T: Ord> FromIterator for BinaryHeap<T>;
implement<T: Ord> Extend for BinaryHeap<T>;
Eq and Hash both sort the contents before comparing/hashing
so the a == b → hash(a) == hash(b) invariant holds across distinct
internal heap arrangements of the same multiset of elements. Two
heaps built from the same input list, even via different push orders,
are interchangeable as keys in a Map<BinaryHeap<T>, V> or as
elements of a Set<BinaryHeap<T>>.
BTreeMap<K, V> / BTreeSet<T> — ordered
Red-black tree (B-factor 12 internally, cache-friendly). K: Ord.
Common operations (BTreeMap/BTreeSet both)
.new() / .from_iter(iter)
.len() .is_empty()
.insert(k, v) / .insert(v)
.remove(&k) -> Maybe<V>
.get(&k) / .get_mut(&k)
.contains_key(&k) / .contains(&v)
.iter() // sorted ascending
.keys() / .values() / .values_mut() // map only
.into_iter() .drain(range)
.retain(|k, v| pred) / |v| pred
.clear()
Ordered operations
m.first_key_value() -> Maybe<(&K, &V)> // smallest
m.last_key_value() -> Maybe<(&K, &V)> // largest
m.pop_first() -> Maybe<(K, V)>
m.pop_last() -> Maybe<(K, V)>
m.range(lo..hi) // iterator in range
m.range_mut(lo..hi)
m.split_off(&key) -> BTreeMap<K,V> // split into two halves
Entry API
Same shape as Map's entry API (or_insert, or_insert_with,
and_modify, or_default).
Example — windowed metrics
let mut by_ts: BTreeMap<Instant, Metric> = BTreeMap.new();
// … populate …
let now = Instant.now();
let last_minute: List<&Metric> = by_ts
.range((now - 1.minutes())..=now)
.map(|(_, m)| m)
.collect();
let recent_total: Float = last_minute.iter().map(|m| m.value).sum();
Choice guide
| Need | Use |
|---|---|
| key lookup, any order | Map<K,V> (hash) |
| key lookup + ordered iteration | BTreeMap<K,V> |
| set membership, any order | Set<T> |
| set membership + ordered iteration / range queries | BTreeSet<T> |
| FIFO/LIFO queue | Deque<T> |
| priority queue | BinaryHeap<T> (or wrap in Reverse for min-heap) |
| "just a list" | List<T> |
Slice utilities (slice.vr)
&[T] is the shared type for "borrowed view of a contiguous run of
T". All collections' .iter() and indexing yield slices where it
makes sense.
s.len() s.is_empty()
s.first() / s.last() / s.split_first() / s.split_last()
s.iter() / s.iter_mut()
s.chunks(n) / s.chunks_exact(n)
s.windows(n) // sliding, size-n, WindowsIter
s.split_at(i) -> (&[T], &[T])
s.split(|x| pred(x)) // iterator of sub-slices
s.partition_point(|x| pred(x))
s.binary_search(&v) // T: Ord
s.contains(&v) -> Bool // T: PartialEq
s.starts_with(&prefix) / s.ends_with(&suffix)
s.as_ptr() / s.as_mut_ptr() -> *const/mut T // unsafe bridge
List<T>.as_slice() and List<T>.as_mut_slice() give you &[T] /
&mut [T] for interop with slice-accepting functions.
Implementation summary
| Collection | Default implementation | Notes |
|---|---|---|
List<T> | growable contiguous buffer, factor = 2 | @repr(contiguous, growth = N) overrideable |
Map<K,V> | Swiss-table flat hash map | linear probing, tombstones on delete |
Set<T> | Swiss-set | same as Map |
BTreeMap<K,V> | red-black tree, B = 12 | cache-friendly node layout |
BTreeSet<T> | red-black tree | same engine |
BinaryHeap<T> | array-backed binary heap | 4-ary pending evaluation |
Deque<T> | ring buffer | rotation is pointer-update only |
See also
- base —
Iteratorprotocol and adapters. - text —
Text(semantic "collection of Unicode scalars"). - sync — locking wrappers (
Mutex<List<T>>, etc.). - Language → patterns — slice, record, rest patterns over these types.
Caches
LruCache<K, V>
let mut cache: LruCache<Text, User> = LruCache.new(1024);
let prev = cache.insert(key.clone(), user);
match cache.get(&key) { Some(u) => ..., None => ... }
Capacity-bounded hash map with LRU eviction on full. insert
returns the prior value when the key was already present
(useful for refcount bookkeeping). peek inspects without
touching LRU order; remove / clear / contains cover the
usual surface. stats() returns { size, hits, misses, evicted }.
TtlCache<K, V>
let cfg = TtlCacheConfig { capacity: 1024, default_ttl: Duration.from_secs(300) };
let mut cache: TtlCache<Text, Session> = TtlCache.new(cfg);
cache.insert(key.clone(), session); // default TTL
cache.insert_with_ttl(key2, session2, Duration.from_secs(60)); // override
cache.purge_expired(); // periodic sweep
Combines capacity-based LRU eviction with per-entry time-based
expiration. Expiry is lazy on read (hot-path cost = one
compare) — callers schedule purge_expired for idle reclamation.
TtlCacheStats surfaces hit/miss/expired/evicted counts for
scraping.
Use LruCache when only capacity matters; use TtlCache when
freshness windows also apply (session cache, JWT replay cache,
DNS cache).
Probabilistic sketches
The Bloom / HLL / Count-Min trio — bounded-memory answers to "is X there?", "how many distinct?", "how often?". HMAC-SHA256 keyed hashing with per-filter CSPRNG-sourced keys; adversarial inputs cannot skew past the theoretical error bound.
BloomFilter
let mut bf = BloomFilter.with_target(
100_000, // expected items
1000, // target FP rate × 1,000,000 (0.1%)
);
bf.insert(key);
if bf.contains(key) { ... }
let was_present = bf.check_and_set(key); // atomic check-and-set
bf.clear(); // new HMAC key generated
Kirsch-Mitzenmacher double hashing: one HMAC call → two 64-bit
halves → k probe offsets via h1 + i × h2 mod m. Typical
operation: ~150 ns end-to-end.
HyperLogLog
let mut hll = HyperLogLog.new(DEFAULT_PRECISION); // p=14 ≈ 12 KiB, 0.81% error
hll.add(b"user:alice");
hll.add(b"user:bob");
let approx_distinct: UInt64 = hll.estimate();
// Mergeable sketches across processes share a 32-byte key.
let merged = HyperLogLog.new_with_key(p, shared_key);
merged.merge(&other)?;
Flajolet et al. 2007 with small-range linear-counting correction.
Precision p ∈ [4, 16] controls memory/accuracy: p=14 is the
Redis PFCOUNT default.
CountMinSketch
let mut cms = CountMinSketch.with_target(0.001, 0.001);
// ≤ 0.1% error (ε) at 99.9% confidence (1 - δ) — 76 KiB.
cms.add(b"user:alice");
cms.add_n(b"bulk-import", 500);
let upper_bound = cms.estimate(b"user:alice"); // may over- but never under-report
Cormode & Muthukrishnan 2005. Width w = ⌈e / ε⌉, depth d = ⌈ln(1/δ)⌉. Saturating u32 cells — no wraparound even at hot items past 4 billion observations.
Reservoir<T>
Status: partial. Fill-phase API (
new,offerwhilelen < capacity,len,capacity,seen,take,reset) is conformance-tested on--interpand--aot. The replacement-phase path (offerafterlen == capacity) is currently gated on the missingcore.sys.common.random_bytesintrinsic in the VBC dispatch table — tests pinned incore-tests/collections/reservoir/regression_test.vr.
public type Reservoir<T> is {
samples: List<T>,
capacity: Int,
seen: UInt64,
};
let mut res: Reservoir<TraceId> = Reservoir.new(1000);
while let Some(trace) = stream.next() {
let _ = res.offer(trace); // Bool: true iff item retained.
}
let sample: List<TraceId> = res.take();
API:
Reservoir.new(capacity: Int) -> Reservoir<T> // capacity < 1 clamped to 1
res.offer(item: T) -> Bool // true iff item retained
res.take(self) -> List<T> // consumes
res.len() -> Int // current sample count
res.capacity() -> Int // configured cap
res.seen() -> UInt64 // total items observed
res.reset() // drop samples + zero seen
Behavioural laws (pinned by property_test.vr):
- Capacity clamping:
new(n).capacity() = max(n, 1). - Seen counter:
seen()advances exactly once peroffer. - Fill-phase retention: every
offerreturns true whileseen ≤ capacity. - Length ceiling:
len() ≤ capacity()at all times. - Reset round-trip:
reset() ⇒ len = 0 ∧ seen = 0;capacityis preserved. - Reset idempotency: repeated
reset()calls leave the same observable state. - Capacity invariance: no public API call besides construction
changes
capacity().
Vitter 1985 Algorithm R — uniform sampling from a stream of
unknown length. Every item has marginal probability
capacity / stream_length of surviving. Used by tail-sampling
tracers, streaming analytics, ML out-of-core shuffling.
Conformance suite: core-tests/collections/reservoir/ —
14 unit + 7 property + 4 integration + 4 pinned regressions.
ConsistentHashRing
let mut ring = ConsistentHashRing.new(); // 160 vnodes default (Ketama)
ring.add_node(Text.from("cache-a"));
ring.add_node(Text.from("cache-b"));
ring.add_node(Text.from("cache-c"));
let primary: Maybe<Text> = ring.node_for_key(b"session:42");
let replicas: List<Text> = ring.nodes_for_key(b"key", 3);
Ketama-compatible position derivation — virtual-node positions
are sha256(node_name + "-" + decimal(vnode_idx))[..8] as LE
u64. Wire-compatible with memcached Ketama, Redis ring_hash,
Envoy ring_hash load balancer.
Adding/removing one node moves only key_count / node_count
entries (the consistent-hashing property). nodes_for_key(key, N)
returns N distinct nodes in preference order — primary first,
then replicas.
UnionFindInt and UnionFind<T> — Disjoint-Set Union
Status: partial.
UnionFindInt(Int-keyed dense path) is conformance-tested end-to-end on both--interpand--aot. GenericUnionFind<T: Hash + Eq>is tested along the working axis (distinct-key make, find on unregistered keys, union creates keys, len tracking) and@ignored inregression_test.vrfor the broken axis (idempotent make on duplicate keys, component_size after union). Every defect that gates the remaining coverage is reproduced in the regression suite and tracked.
DSU — Tarjan 1975, with both path compression and union-by-rank, giving
amortised inverse-Ackermann (O(α(n)) ≈ O(1)) cost per operation.
Two APIs. Pick the dense one when keys are densely-indexed integers
(graph nodes 0..n-1, slot indices, etc.); pick the generic one for
strings, UUIDs, or any Hash + Eq carrier.
UnionFindInt — dense Int-keyed (array-backed)
public type UnionFindInt is {
parent: List<Int>,
rank: List<Int>,
components: Int,
size: List<Int>,
};
let mut uf = UnionFindInt.new(n);
uf.find(x: Int) -> Int // canonical root, path-compressed
uf.union(a: Int, b: Int) -> Bool // true iff this call merged distinct sets
uf.same_set(a: Int, b: Int) -> Bool
uf.component_size(x: Int) -> Int
uf.component_count() -> Int // O(1) — cached
uf.len() -> Int // total elements (== n)
uf.clear() // restore initial singleton state
Behavioural contract (laws pinned by property_test.vr):
- Reflexivity:
same_set(x, x) - Symmetry:
same_set(a, b) = same_set(b, a) - Transitivity:
same_set(a, b) ∧ same_set(b, c) ⇒ same_set(a, c) - Union introduces equivalence:
union(a, b) ⇒ same_set(a, b) - Union return-value partition:
union(a, b) = trueiffaandbwere in distinct sets at call entry. - Idempotent union: a second
union(a, b)is a no-op; state andcomponent_count()are unchanged. - Component-count invariant: each successful union decrements the count by exactly 1; failed unions leave it unchanged.
- Component-size sum:
Σ component_size(root_i) = nover distinct roots. - Find idempotency:
find(find(x)) = find(x). same_setdefinition:same_set(a, b) = (find(a) == find(b)).- Clear restores singletons: post-
clear(), every element is its own root andcomponent_count() = n.
Use cases (industry-standard DSU territory):
- Kruskal MST — sort edges by weight, union endpoints, accept the
edge iff the union returned
true. - CRDT add-wins set reconciliation — per-key DSU over causal-set representatives.
- Static-analyser equivalence classes — variable aliasing, escape analysis, type-variable bands.
- Image-segmentation connected-component labelling — 4/8-connected pixels via two raster passes.
- Cycle detection in incremental graph construction —
same_set(u, v)before adding edge(u, v).
UnionFind<T: Hash + Eq> — generic, map-backed
public type UnionFind<T: Hash + Eq> is {
parent: Map<T, T>,
rank: Map<T, Int>,
size: Map<T, Int>,
components: Int,
};
let mut uf: UnionFind<Text> = UnionFind.new();
uf.make(x: T) // ensure x is known; idempotent
uf.find(x: &T) -> T // canonical root, path-compressed
uf.union(a: T, b: T) -> Bool // true iff merge happened
uf.same_set(a: &T, b: &T) -> Bool
uf.component_size(x: &T) -> Int
uf.component_count() -> Int
uf.len() -> Int // distinct keys ever introduced
Same algebraic laws as UnionFindInt. The "make is idempotent on
duplicate keys" axis is currently blocked by upstream
Map.contains_key(&K) defect (see Open defects).
Length protocol
Both types implement Length, so the free-function form works:
assert_eq(uf.len(), len(&uf));
Performance
find: amortisedO(α(n))≈O(1). α is the inverse Ackermann function — bounded by 4 for any conceivable input size.union: amortisedO(α(n)).component_count:O(1). The counter is maintained in-line, not recomputed.component_size:O(α(n))(one find).clear(UnionFindInt only):O(n)on the dense backing arrays.
Conformance suite
core-tests/collections/union_find/ — 52 non-ignored tests, 11 pinned
regressions. Run with verum test --interp --filter test_uf_ and
verum test --interp --filter property_int_.
Open defects in collections
Tracked across the conformance suite under
core-tests/collections/<module>/regression_test.vr. Each entry below
links to the regression-pinned reproducer.
| # | Defect | Surface | Status |
|---|---|---|---|
| 1 | Map.get(K) -> V returns zero-value on miss instead of Maybe<V> | core/collections/map.vr:457; ~666 call sites | tracked, fix is a cross-cutting migration |
| 2 | Map.contains_key(&K) silently returns false (type mismatch) | core/collections/map.vr:614; reachable from union_find.vr:183 | tracked |
| 3 | Map.get_optional / Map.get_key_value lenient-skipped at runtime | core/collections/map.vr:504, 579 | tracked, requires compiler-side investigation |
| 4 | Text.from_utf8_unchecked heap-allocated Text has zero-length as_bytes() despite correct len field | core/text/text.vr:439; surfaces in every Map<Text, V> populated via Text.from(...) | tracked, requires interpreter-side RefSliceRaw fix |
| 5 | Text.eq(&self, &Text) method dispatch returns false for byte-identical literal Texts | core/text/text.vr:3370; method-resolution lands on the wrong impl | tracked |
| 6 | core.sys.common.random_bytes intrinsic missing from VBC dispatch table | reachable from core/collections/reservoir.vr:148; gates Reservoir.offer replacement phase, plus any core.base.random.* use site (Bloom, HyperLogLog HMAC keys) | tracked |
These defects are not specific to the UnionFind types — they are
foundational stdlib / language-level gaps that cascade into every
module that touches Map<Text, V> or Map<&K, V>. Closing any of them
unlocks coverage in multiple downstream modules at once.