Skip to main content

CBGR Internals

This page is for compiler developers and anyone curious about the mechanics. For the user-facing view, see Language → CBGR.

Data structures

AllocationHeader

32 bytes, prepended to every CBGR-tracked allocation, 32-byte aligned (one full cache line on x86_64 / aarch64).

// stdlib/mem/header.vr, @repr(C, align(32))
type AllocationHeader is {
uint32_t size; // 4 B — payload size
uint32_t alignment; // 4 B — payload alignment
uint32_t generation; // 4 B — CBGR counter, bumped on free
uint16_t epoch; // 2 B — wraparound-safety counter
uint16_t capabilities; // 2 B — 8 capability flags
uint32_t type_id; // 4 B — runtime type info
uint32_t flags; // 4 B — drop impl, pinned, etc.
uint32_t reserved[2]; // 8 B — reserved for future use
};

generation + epoch are laid out so they can be loaded in a single 64-bit atomic — load_generation_epoch_combined() does exactly that on the fast path with Acquire ordering. Free uses Release on the same word so reader visibility is guaranteed without a full fence.

ThinRef<T>

16 bytes. Used for sized T.

type ThinRef is<T> {
T* ptr;
uint32_t generation;
uint32_t epoch_caps; // 16 bits epoch + 16 bits capabilities
};

FatRef<T>

32 bytes. Used when T is unsized (slices, dyn) or for interior references that need an offset into a larger allocation.

type FatRef is<T> {
T* ptr; // 8 B
uint32_t generation; // 4 B
uint32_t epoch_caps; // 4 B — epoch:16 | caps:16
uint64_t metadata; // 8 B — slice length, dyn vtable, …
uint32_t offset; // 4 B — non-zero for interior refs
uint32_t reserved; // 4 B — alignment + room to grow
};

Check sequence

T* deref(ThinRef<T> r) {
Header* h = (Header*)((uint8_t*)r.ptr - sizeof(Header));
if (__builtin_expect(h->generation != r.generation, 0)) {
handle_use_after_free(&r, h);
}
return r.ptr;
}

On x86_64 this compiles to:

mov rax, [rdi] ; load pointer
mov ecx, [rax - 16] ; load header.generation
cmp ecx, [rdi + 8] ; compare with ref.generation
jne .uaf_handler
ret

Measured: 55 cycles, ~13.8 ns on an M3 Max.

Allocator

Based on Mimalloc's segment architecture:

  • Segment: 32 MiB virtual region, lazily committed.
  • Page: subdivision of a segment; each page holds one size class.
  • Size class: 73 classes, spaced at 12.5% intervals, from 8 bytes to 4 MiB.
  • LocalHeap: per-thread cache; lock-free fast path.
  • Free list: per-page, reversed on full page to maintain LIFO for cache friendliness.

Freeing an object:

  1. Atomically increments header.generation.
  2. Pushes the slot onto the page's free list.

Concurrent readers are protected by hazard pointers (HazardGuard): a reader installs its reference's target into a thread-local slot before the CBGR check; a freer checks all slots before recycling the memory.

Generation wraparound

Generations are 32-bit → wraparound at 4.29 billion allocations per object. At 1 ns per alloc, that's ~4.3 seconds per object — a real concern for tight loops that rapidly alloc/free.

Solution: epochs. Each thread advances a 32-bit epoch counter periodically (every 1 ms, or on signal). References carry the epoch at creation. When the allocator rolls a generation, it also advances the page's epoch; any reference with an older epoch fails the check.

With 32-bit epochs rolled at 1 kHz per thread, wraparound is effectively infinite for any practical workload.

Capability bits

The low 8 bits of the capabilities field (and the mirrored 8 bits in ThinRef.epoch_caps) encode a monotonically attenuating capability set:

BitNameMeaning
0CAP_READreads permitted (always set for a live reference)
1CAP_WRITEwrites permitted
2CAP_EXECUTEthe target is callable (function pointers)
3CAP_DELEGATEthe reference can be handed off to another context
4CAP_REVOKEthe holder can revoke outstanding copies
5CAP_BORROWEDthis is a borrow, not an owner
6CAP_MUTABLE&mut semantics (exclusive access)
7CAP_NO_ESCAPEoptimisation hint: cannot escape the function

Monotonic attenuation: capabilities can only be removed as a reference is passed around — a Database.readonly() transformation clears CAP_WRITE once and the result can never regain it. The compiler enforces this at every conversion point.

A method dispatch for Database.write(...) emits a capability check:

assert((r.capabilities & CAP_WRITE) != 0);

The check is one AND + one branch — ~1 ns, dominated by the branch predictor.

Compile-time analysis suite

The tier decisions that let the compiler emit &checked T or &unsafe T instead of the Tier 0 &T come from a battery of analyses in verum_cbgr:

ModuleRole
tier_analysis.rsunified API — composes the results below into per-reference tier decisions
escape_analysis.rsforward dataflow; four states: NoEscape, MayEscape, Escapes, Unknown
escape_categories.rsrefines Escapes with provenance data for SBGL decisions
ownership_analysis.rstracks move/borrow/consume per reference
concurrency_analysis.rsdata-race detection driven by Send/Sync
lifetime_analysis.rsclassic borrow checking
nll_analysis.rsnon-lexical lifetimes (fine-grained region inference)
polonius_analysis.rsPolonius-style origin tracking for hard cases
dominance_analysis.rsdominator-based promotion
points_to_analysis.rsAndersen-style points-to graph
smt_alias_verification.rsSMT-backed alias proofs when the others are inconclusive

The pipeline walks references through each analysis, producing a TierAnalysisResult that records decisions per RefId along with statistics used for the optimiser's tier distribution report.

Promotion rule

Only references classified as NoEscape are eligible for SBGL (stack-backed generation-less) promotion. MayEscape and Escapes stay at Tier 0. When the purely syntactic analyses are inconclusive, smt_alias_verification.rs asks the SMT solver whether two references can alias — proofs beyond "no" fall back to Tier 0 as well.

Typical promotion rate on idiomatic code: 60–95 % of &T occurrences land at Tier 1.

Performance numbers (M3 Max)

OperationCyclesNanoseconds
Unchecked deref20.5
&checked T deref20.5
&T deref (hot header)5513.8
&T deref (cold header)22055
Allocation (cached size class)4010
Free + gen++8020
Realloc (in place)6015
Realloc (moving)depends

VBC tier opcodes

Each tier is a distinct VBC instruction, so the tier decision is visible all the way down to the bytecode — the interpreter and the AOT lowering pipeline read the same opcode.

OpcodeMnemonicMeaning
0x70RefTier 0 immutable borrow
0x71RefMutTier 0 mutable borrow
0x72Derefderef with CBGR validation
0x73DerefMutmutable deref with validation
0x74ChkRefexplicit validation check (guard)
0x75RefCheckedTier 1 — compiler-proven safe, 0 ns deref
0x76RefUnsafeTier 2 — unsafe, 0 ns deref
0x77DropRefdrop a reference (bookkeeping only)

Tier behaviour across execution modes

ModeTier 0 derefTier 1 derefTier 2 deref
Interpreterfull checkfull check (safety first)full check (safety first)
AOT (LLVM, debug)full checkdirect loaddirect load
AOT (LLVM, release)full check, elided where escape analysis proves safedirect loaddirect load

The interpreter intentionally validates every deref regardless of tier — it is the safe-by-default executor. Tier 1 and Tier 2 give their 0 ns benefit under AOT, where the check is proven away statically. Verum does not ship a JIT tier: the AOT pipeline lowers VBC through LLVM (for CPU) and MLIR (for GPU), and the interpreter handles every non-AOT execution path. A speculative baseline/JIT tier was evaluated and removed as not pulling its weight given how fast the interpreter already is and how effectively the AOT pipeline elides tiered checks.

MLIR lowering

The verum.cbgr dialect mirrors the VBC opcodes:

verum.cbgr.borrow / verum.cbgr.borrow_mut
verum.cbgr.deref / verum.cbgr.deref_mut
verum.cbgr.store
verum.cbgr.drop

Lowering strategy is one of:

  • CbgrValidated — Tier 0, emits a call into stdlib/mem's validation function.
  • DirectLoad — Tier 1, lowers to llvm.load directly.
  • UncheckedLoad — Tier 2, lowers to llvm.load with no metadata decorations.

Threads and concurrency

CBGR is designed to be lock-free on the fast path:

  • Per-thread local heap avoids contention.
  • Generations are atomic loads (no fences needed on x86_64 TSO).
  • Hazard pointers replace epoch-based reclamation for finer-grained safety.

On weakly-ordered architectures (aarch64), an acquire fence on deref and a release fence on free provide the necessary ordering.

See also