CBGR Internals

This page is for compiler developers and anyone curious about the mechanics. For the user-facing view, see Language → CBGR.

Data structures

AllocationHeader

32 bytes, prepended to every CBGR-tracked allocation, 32-byte aligned (one full cache line on x86_64 / aarch64).

// stdlib/mem/header.vr, @repr(C, align(32))
type AllocationHeader is {
    uint32_t size;          //  4 B — payload size
    uint32_t alignment;     //  4 B — payload alignment
    uint32_t generation;    //  4 B — CBGR counter, bumped on free
    uint16_t epoch;         //  2 B — wraparound-safety counter
    uint16_t capabilities;  //  2 B — 8 capability flags
    uint32_t type_id;       //  4 B — runtime type info
    uint32_t flags;         //  4 B — drop impl, pinned, etc.
    uint32_t reserved[2];   //  8 B — reserved for future use
};

generation + epoch are laid out so they can be loaded in a single 64-bit atomic — load_generation_epoch_combined() does exactly that on the fast path with Acquire ordering. Free uses Release on the same word so reader visibility is guaranteed without a full fence.

ThinRef<T>

16 bytes. Used for sized T.

type ThinRef is<T> {
    T*        ptr;
    uint32_t  generation;
    uint32_t  epoch_caps;  // 16 bits epoch + 16 bits capabilities
};

FatRef<T>

32 bytes. Used when T is unsized (slices, dyn) or for interior references that need an offset into a larger allocation.

type FatRef is<T> {
    T*        ptr;                  // 8 B
    uint32_t  generation;           // 4 B
    uint32_t  epoch_caps;           // 4 B — epoch:16 | caps:16
    uint64_t  metadata;             // 8 B — slice length, dyn vtable, …
    uint32_t  offset;               // 4 B — non-zero for interior refs
    uint32_t  reserved;             // 4 B — alignment + room to grow
};

Check sequence

T* deref(ThinRef<T> r) {
    Header* h = (Header*)((uint8_t*)r.ptr - sizeof(Header));
    if (__builtin_expect(h->generation != r.generation, 0)) {
        handle_use_after_free(&r, h);
    }
    return r.ptr;
}

On x86_64 this compiles to:

    mov    rax, [rdi]                 ; load pointer
    mov    ecx, [rax - 16]            ; load header.generation
    cmp    ecx, [rdi + 8]             ; compare with ref.generation
    jne    .uaf_handler
    ret

Measured: 55 cycles, ~13.8 ns on an M3 Max.

Allocator

Based on Mimalloc's segment architecture:

Segment: 32 MiB virtual region, lazily committed.
Page: subdivision of a segment; each page holds one size class.
Size class: 73 classes, spaced at 12.5% intervals, from 8 bytes to 4 MiB.
LocalHeap: per-thread cache; lock-free fast path.
Free list: per-page, reversed on full page to maintain LIFO for cache friendliness.

Freeing an object:

Atomically increments header.generation.
Pushes the slot onto the page's free list.

Concurrent readers are protected by hazard pointers (HazardGuard): a reader installs its reference's target into a thread-local slot before the CBGR check; a freer checks all slots before recycling the memory.

Generation wraparound

Generations are 32-bit → wraparound at 4.29 billion allocations per object. At 1 ns per alloc, that's ~4.3 seconds per object — a real concern for tight loops that rapidly alloc/free.

Solution: epochs. Each thread advances a 32-bit epoch counter periodically (every 1 ms, or on signal). References carry the epoch at creation. When the allocator rolls a generation, it also advances the page's epoch; any reference with an older epoch fails the check.

With 32-bit epochs rolled at 1 kHz per thread, wraparound is effectively infinite for any practical workload.

Capability bits

The low 8 bits of the capabilities field (and the mirrored 8 bits in ThinRef.epoch_caps) encode a monotonically attenuating capability set:

Bit	Name	Meaning
0	`CAP_READ`	reads permitted (always set for a live reference)
1	`CAP_WRITE`	writes permitted
2	`CAP_EXECUTE`	the target is callable (function pointers)
3	`CAP_DELEGATE`	the reference can be handed off to another context
4	`CAP_REVOKE`	the holder can revoke outstanding copies
5	`CAP_BORROWED`	this is a borrow, not an owner
6	`CAP_MUTABLE`	`&mut` semantics (exclusive access)
7	`CAP_NO_ESCAPE`	optimisation hint: cannot escape the function

Monotonic attenuation: capabilities can only be removed as a reference is passed around — a Database.readonly() transformation clears CAP_WRITE once and the result can never regain it. The compiler enforces this at every conversion point.

A method dispatch for Database.write(...) emits a capability check:

assert((r.capabilities & CAP_WRITE) != 0);

The check is one AND + one branch — ~1 ns, dominated by the branch predictor.

Compile-time analysis suite

The tier decisions that let the compiler emit &checked T or &unsafe T instead of the Tier 0 &T come from a battery of analyses in verum_cbgr:

Module	Role
`tier_analysis.rs`	unified API — composes the results below into per-reference tier decisions
`escape_analysis.rs`	forward dataflow; four states: `NoEscape`, `MayEscape`, `Escapes`, `Unknown`
`escape_categories.rs`	refines `Escapes` with provenance data for SBGL decisions
`ownership_analysis.rs`	tracks move/borrow/consume per reference
`concurrency_analysis.rs`	data-race detection driven by `Send`/`Sync`
`lifetime_analysis.rs`	classic borrow checking
`nll_analysis.rs`	non-lexical lifetimes (fine-grained region inference)
`polonius_analysis.rs`	Polonius-style origin tracking for hard cases
`dominance_analysis.rs`	dominator-based promotion
`points_to_analysis.rs`	Andersen-style points-to graph
`smt_alias_verification.rs`	SMT-backed alias proofs when the others are inconclusive

The pipeline walks references through each analysis, producing a TierAnalysisResult that records decisions per RefId along with statistics used for the optimiser's tier distribution report.

Promotion rule

Only references classified as NoEscape are eligible for SBGL (stack-backed generation-less) promotion. MayEscape and Escapes stay at Tier 0. When the purely syntactic analyses are inconclusive, smt_alias_verification.rs asks the SMT solver whether two references can alias — proofs beyond "no" fall back to Tier 0 as well.

Typical promotion rate on idiomatic code: 60–95 % of &T occurrences land at Tier 1.

Performance numbers (M3 Max)

Operation	Cycles	Nanoseconds
Unchecked deref	2	0.5
`&checked T` deref	2	0.5
`&T` deref (hot header)	55	13.8
`&T` deref (cold header)	220	55
Allocation (cached size class)	40	10
Free + gen++	80	20
Realloc (in place)	60	15
Realloc (moving)	depends	–

VBC tier opcodes

Each tier is a distinct VBC instruction, so the tier decision is visible all the way down to the bytecode — the interpreter and the AOT lowering pipeline read the same opcode.

Opcode	Mnemonic	Meaning
0x70	`Ref`	Tier 0 immutable borrow
0x71	`RefMut`	Tier 0 mutable borrow
0x72	`Deref`	deref with CBGR validation
0x73	`DerefMut`	mutable deref with validation
0x74	`ChkRef`	explicit validation check (guard)
0x75	`RefChecked`	Tier 1 — compiler-proven safe, 0 ns deref
0x76	`RefUnsafe`	Tier 2 — unsafe, 0 ns deref
0x77	`DropRef`	drop a reference (bookkeeping only)

Tier behaviour across execution modes

Mode	Tier 0 deref	Tier 1 deref	Tier 2 deref
Interpreter	full check	full check (safety first)	full check (safety first)
AOT (LLVM, debug)	full check	direct load	direct load
AOT (LLVM, release)	full check, elided where escape analysis proves safe	direct load	direct load

The interpreter intentionally validates every deref regardless of tier — it is the safe-by-default executor. Tier 1 and Tier 2 give their 0 ns benefit under AOT, where the check is proven away statically. Verum does not ship a JIT tier: the AOT pipeline lowers VBC through LLVM (for CPU) and MLIR (for GPU), and the interpreter handles every non-AOT execution path. A speculative baseline/JIT tier was evaluated and removed as not pulling its weight given how fast the interpreter already is and how effectively the AOT pipeline elides tiered checks.

MLIR lowering

The verum.cbgr dialect mirrors the VBC opcodes:

verum.cbgr.borrow        / verum.cbgr.borrow_mut
verum.cbgr.deref         / verum.cbgr.deref_mut
verum.cbgr.store
verum.cbgr.drop

Lowering strategy is one of:

CbgrValidated — Tier 0, emits a call into stdlib/mem's validation function.
DirectLoad — Tier 1, lowers to llvm.load directly.
UncheckedLoad — Tier 2, lowers to llvm.load with no metadata decorations.

Threads and concurrency

CBGR is designed to be lock-free on the fast path:

Per-thread local heap avoids contention.
Generations are atomic loads (no fences needed on x86_64 TSO).
Hazard pointers replace epoch-based reclamation for finer-grained safety.

On weakly-ordered architectures (aarch64), an acquire fence on deref and a release fence on free provide the necessary ordering.

Analysis configuration knobs

CBGR ships a uniform configuration surface across every static analysis pass. Every gate listed below is honoured by the consumer code and pinned by a unit test inside the corresponding analyser module — the user-facing contract is "set the field, observe the difference".

The configs share a uniform shape:

Each *Config carries production-suitable defaults.
Where the same config has multiple stances (aggressive vs conservative vs disabled), factory constructors expose them.
Where a result struct exists, it mirrors the config booleans through *() accessors so diagnostic builders never need to re-read the analyser's internal state.
min_confidence-style threshold fields filter every warning class identically (one filter pass per detector, applied centrally).
max_* cap fields with default 0 mean "unlimited"; any positive value bounds the analysis cost at the price of potentially missing entries in the truncated tail (documented inline in the analyser source).

`LifetimeAnalysisConfig`

Field	Default	What it gates
`infer_lifetimes`	`true`	Phase 1 (lifetime creation per reference). When `false`, the analyzer skips Phase 1 entirely; downstream phases see an empty `ref_lifetimes` map. Used by embedders importing lifetime IDs from an upstream IR.
`check_outlives`	`true`	Phase 4 (constraint solving). When `false`, the constraint vector is still surfaced (callers driving their own solver consume it) but `violations` stays empty and `stats.constraints_solved == 0`.
`detailed_diagnostics`	`true`	Surfaced via `LifetimeAnalysisResult.detailed_diagnostics()`. Diagnostic builders consult it to choose between verbose (span / region / chain detail) and compact (one-line summary) rendering.
`max_iterations`	`1000`	Upper bound for the liveness fixed-point loop. Prevents pathological CFGs from running unbounded analysis time.

`NllConfig`

Field	Default	What it gates
`two_phase_borrows`	`true`	Surfaced via `NllAnalysisResult.two_phase_borrows_enabled()`. Diagnostic builders consult this when explaining a single-phase borrow rejection — when `false`, the message can suggest enabling two-phase as the fix; when `true`, the rejection is structural.
`polonius_mode`	`false`	Surfaced via `NllAnalysisResult.polonius_mode()`. Tags the result for downstream Polonius origin-tracking consumption.
`max_iterations`	`1000`	Upper bound for both the liveness and constraint-solver fixed-point loops.
`detailed_diagnostics`	`true`	Surfaced via `NllAnalysisResult.detailed_diagnostics()` for verbose vs compact rendering.

`OwnershipAnalysisConfig`

Field	Default	What it gates
`track_stack`	`false`	When `false`, `AllocationKind::Stack` allocations are dropped from the analysis (stack lifetime is structural, not heap-style).
`track_temporaries`	`false`	Same shape as `track_stack` for `AllocationKind::Temporary` (expression-result allocations). Embedders that need to track them opt in.
`detect_leaks`	`true`	Leak-detection pass. When `false`, the leak warning list is empty regardless of analysis state.
`min_confidence`	`0.5`	Filters every warning class (double-free, use-after-free, leak) whose `confidence` is below the threshold.
`max_blocks`	`0` (unlimited)	Caps the CFG block walk in `extract_allocation_sites`. Trade-off: bounded analysis cost at the price of potentially missing allocations in the truncated tail.

`PoloniusConfig`

Field	Default	What it gates
`max_iterations`	`1000`	Upper bound for the Datalog-style fixed-point loop in `compute_output_facts`.
`location_sensitive`	`true`	Surfaced via `PoloniusAnalysisResult.is_location_sensitive()`. Diagnostic builders consult this when explaining a loan rejection — when `false`, the message can suggest enabling location-sensitive analysis as the fix; when `true`, the rejection is structural.
`check_moves`	`true`	Surfaced via `PoloniusAnalysisResult.check_moves_enabled()`. Downstream consumers that wrap the analyser with `MoveTracker` integration check this flag before consulting the tracker — `false` suppresses the move-error class from the diagnostic surface.

`EscapeAnalysisConfig`

Field	Default	What it gates
`enable_interprocedural`	`true`	Cross-function escape analysis. Disabling falls back to per-function conservative results.
`max_iterations`	`100`	Upper bound for the interprocedural fixed-point loop.

`AbstractionConfig` (path-sensitive predicate abstraction)

Field	Default	What it gates
`max_abstraction_level`	`4`	Caps the abstraction level requested by `abstract_predicate`. Higher → more aggressive (less precise).
`path_threshold`	`50`	`merge_similar_paths` skips work when path count is below this.
`use_smt_equivalence`	`true`	Gates `check_equivalence_smt`. When `false`, the method returns the conservative "not provably equivalent" answer without invoking the SMT layer (stats counter still increments). Surfaced via `smt_equivalence_enabled()`.
`use_subsumption`	`true`	Gates `check_subsumption_smt` symmetrically. Surfaced via `subsumption_enabled()`.
`use_widening`	`true`	Master switch over `abstract_level3`. When `false`, level-3 falls back to level-2 (no widening) — preserves precision at the cost of slower convergence on deeply-nested loops. Surfaced via `widening_enabled()`.
`widening_threshold`	`3`	When widening is enabled: widen after this many iterations on the same predicate hash.
`max_cache_size`	`10000`	Caps the abstraction / equivalence / subsumption caches to bound memory growth on long runs.
`incremental_merging`	`true`	Surfaced via `incremental_merging_enabled()`. External enumerators that produce path lists consult this to decide whether to fold paths into the merger as they're produced (incremental) or accumulate the full list and merge once at the end.

`PromotionConfig`

Field	Default	What it gates
`enable_promotion`	`true`	Master switch. When `false`, every reference falls to `KeepManagedConservative` regardless of analysis.
`extra_conservative`	`false`	Adds the async/await-crossing + exception-path safety checks to the promotion verdict. `aggressive()` clears it; `conservative()` sets it.
`allow_heap_promotion`	`false`	When `false` (default), a reference with `is_stack_allocated == false` falls back to `KeepManagedConservative` even when dominance + escape analysis approves. The safe default — heap-rooted promotions need additional liveness analysis the analyzer hasn't produced for that reference. `PromotionConfig::aggressive()` flips it to `true`.
`confidence_threshold`	`0.95`	Reserved for the future confidence-tagged decision path. Currently informational.

Convenience constructors:

Constructor	Effect
`default()`	Enabled with the safe heap default (no heap promotion).
`aggressive()`	Enabled, allows heap promotion, lower confidence threshold (0.80).
`conservative()`	Enabled with `extra_conservative` set and a tight confidence threshold (0.99).
`disabled()`	Master switch off — every reference stays managed.

`ConcurrencyAnalysisConfig`

Field	Default	What it gates
`detect_data_races`	`true`	Race-detection pass. `false` returns an empty data-race warning list.
`detect_deadlocks`	`true`	Lock-cycle detection. `false` returns an empty deadlock warning list.
`check_thread_safety`	`true`	Send/Sync surface scan.
`min_confidence`	`0.5`	Filters every warning whose `confidence` is below the threshold.
`max_accesses`	`0` (unlimited)	Caps the analyzer's memory-access list. Once the cap is reached, `extract_accesses` returns early.

`EnhancedContextConfig` (context-sensitive analysis)

Field	Default	What it gates
`flow_sensitive`	`false`	Surfaced via `ContextSensitiveAnalyzer.flow_sensitive_enabled()`. Downstream analyses that maintain per-flow state consult this to decide whether to install or skip the flow-state map.
`adaptive_depth`	`false`	When `true`, `with_config` installs an `AdaptiveDepthPolicy::new(default_depth, max_depth)` so the dependent slot is in play. Surfaced via `adaptive_depth_enabled()`.
`compression`	`false`	When `true`, `with_config` installs a fresh `ContextCompressor` if none is set; pre-existing custom compressors are preserved. Surfaced via `compression_enabled()`.
`default_depth`	`3`	Default analysis depth used when `adaptive_depth == false`. Surfaced via `default_depth()`.
`max_depth`	`10`	Hard upper bound on context depth (safety check). Surfaced via `max_depth()`.

The convenience constructor EnhancedContextConfig::all_enabled() flips the three booleans to true and keeps the depth defaults (3 / 10). Calling with_config(EnhancedContextConfig::all_enabled()) on a fresh ContextSensitiveAnalyzer::new() now produces the same effective shape as the bespoke with_all_enhancements() constructor — depth_policy + compressor are auto-installed.

`ParallelConfig` (context-sensitive parallel analysis)

Field	Default	What it gates
`threshold`	`10`	`should_parallelize` returns `true` only when the input size meets this. Below the threshold, `analyze_parallel` runs sequentially.
`max_threads`	`0` (rayon default)	When `> 0`, `analyze_parallel` installs a bespoke `rayon::ThreadPool` capped to that many workers for the duration of the call. Pool-build failures fall back to the global pool.
`work_stealing`	`true`	Documented stance. Rayon's underlying scheduler is always work-stealing; the accessor `work_stealing_enabled()` exists for diagnostics that surface the declared stance.

`ValueTrackingConfig` (concrete-value tracking for escape refinement)

Field	Default	What it gates
`enable_constant_propagation`	`true`	Concrete-value writes in `propagate_constant`, the constant fast-path inside `propagate_binop`, and the merge-into-concrete path in `propagate_phi`. When `false`, no concrete state is recorded; downstream constant folding sees nothing.
`enable_range_analysis`	`true`	Range-refinement branch inside `propagate_binop`. When `false`, the range computation is skipped entirely.
`enable_symbolic_execution`	`true`	Symbolic-mirror writes in `propagate_constant`, `propagate_binop`, and `propagate_phi`. When `false`, no `SymbolicValue` is constructed.
`max_iterations`	`100`	Worklist iteration cap in `track_concrete_values_with_config`. Bounds analysis cost on pathological CFGs; once exceeded, propagation stops with whatever block states have been recorded so far (best-effort partial result).

The three domain gates are independent — opting out of constant tracking keeps symbolic and range tracking active (and vice-versa) so callers can build a precise per-analysis cost / precision profile. The configuration is passed when constructing a ValuePropagator or driving the enhanced escape analyser; both honour the same ValueTrackingConfig shape and propagate the gates uniformly through their worklist.

`DiagnosticsConfig`

Field	Default	What it gates
`file_name`	`<source>`	Span source filename used in rendered diagnostics.
`include_tier_reasons`	`true`	Attach the Tier 0 reason (escape kind, dominance status, etc.) as a note.
`confidence_threshold`	`0.7`	Suppress warnings whose confidence is below the threshold.
`include_help`	`true`	Attach the canonical fix-it suggestion.
`include_doc_urls`	`true`	Attach the `verum-lang.org/errors/<code>` URL.

Wire-up tests (proof of effect)

Every gate listed above has a corresponding unit test inside its analyser module that asserts the field's documented effect:

Module	Wire-up tests
`lifetime_analysis`	`infer_lifetimes_false_skips_phase_1`, `check_outlives_false_returns_empty_violation_list`, `detailed_diagnostics_flows_to_result`
`nll_analysis`	`nll_config_flows_to_result` (8-cell truth table over the three booleans)
`ownership_analysis`	`min_confidence_filters_low_confidence_warnings`, `max_blocks_caps_extraction_walk`
`concurrency_analysis`	`min_confidence_filters_low_confidence_warnings`, `max_accesses_caps_extraction`
`polonius_analysis`	`polonius_config_flows_to_result` (4-cell truth table over the two booleans)
`predicate_abstraction`	`config_accessors_mirror_construction_values` (16-cell truth table over the four booleans, in `tests/predicate_abstraction_integration_tests.rs`)
`context_enhancements`	`enhanced_context_config_drives_dependencies`, `parallel_config_max_threads_round_trips_through_accessor`
`promotion_decision`	`allow_heap_promotion_default_vs_aggressive`

Run the inventory locally with:

cargo test -p verum_cbgr --lib analysis

Data structures​

AllocationHeader​

ThinRef<T>​

FatRef<T>​

Check sequence​

Allocator​

Generation wraparound​

Capability bits​

Compile-time analysis suite​

Promotion rule​

Performance numbers (M3 Max)​

VBC tier opcodes​

Tier behaviour across execution modes​

MLIR lowering​

Threads and concurrency​

Analysis configuration knobs​

LifetimeAnalysisConfig​

NllConfig​

OwnershipAnalysisConfig​

PoloniusConfig​

EscapeAnalysisConfig​

AbstractionConfig (path-sensitive predicate abstraction)​

PromotionConfig​

ConcurrencyAnalysisConfig​

EnhancedContextConfig (context-sensitive analysis)​

ParallelConfig (context-sensitive parallel analysis)​

ValueTrackingConfig (concrete-value tracking for escape refinement)​

DiagnosticsConfig​

Wire-up tests (proof of effect)​

See also​