Skip to main content

CBGR Internals

This page is for compiler developers and anyone curious about the mechanics. For the user-facing view, see Language → CBGR.

Data structures

AllocationHeader

32 bytes, prepended to every CBGR-tracked allocation, 32-byte aligned (one full cache line on x86_64 / aarch64).

// stdlib/mem/header.vr, @repr(C, align(32))
type AllocationHeader is {
uint32_t size; // 4 B — payload size
uint32_t alignment; // 4 B — payload alignment
uint32_t generation; // 4 B — CBGR counter, bumped on free
uint16_t epoch; // 2 B — wraparound-safety counter
uint16_t capabilities; // 2 B — 8 capability flags
uint32_t type_id; // 4 B — runtime type info
uint32_t flags; // 4 B — drop impl, pinned, etc.
uint32_t reserved[2]; // 8 B — reserved for future use
};

generation + epoch are laid out so they can be loaded in a single 64-bit atomic — load_generation_epoch_combined() does exactly that on the fast path with Acquire ordering. Free uses Release on the same word so reader visibility is guaranteed without a full fence.

ThinRef<T>

16 bytes. Used for sized T.

type ThinRef is<T> {
T* ptr;
uint32_t generation;
uint32_t epoch_caps; // 16 bits epoch + 16 bits capabilities
};

FatRef<T>

32 bytes. Used when T is unsized (slices, dyn) or for interior references that need an offset into a larger allocation.

type FatRef is<T> {
T* ptr; // 8 B
uint32_t generation; // 4 B
uint32_t epoch_caps; // 4 B — epoch:16 | caps:16
uint64_t metadata; // 8 B — slice length, dyn vtable, …
uint32_t offset; // 4 B — non-zero for interior refs
uint32_t reserved; // 4 B — alignment + room to grow
};

Check sequence

T* deref(ThinRef<T> r) {
Header* h = (Header*)((uint8_t*)r.ptr - sizeof(Header));
if (__builtin_expect(h->generation != r.generation, 0)) {
handle_use_after_free(&r, h);
}
return r.ptr;
}

On x86_64 this compiles to:

mov rax, [rdi] ; load pointer
mov ecx, [rax - 16] ; load header.generation
cmp ecx, [rdi + 8] ; compare with ref.generation
jne .uaf_handler
ret

Measured: 55 cycles, ~13.8 ns on an M3 Max.

Allocator

Based on Mimalloc's segment architecture:

  • Segment: 32 MiB virtual region, lazily committed.
  • Page: subdivision of a segment; each page holds one size class.
  • Size class: 73 classes, spaced at 12.5% intervals, from 8 bytes to 4 MiB.
  • LocalHeap: per-thread cache; lock-free fast path.
  • Free list: per-page, reversed on full page to maintain LIFO for cache friendliness.

Freeing an object:

  1. Atomically increments header.generation.
  2. Pushes the slot onto the page's free list.

Concurrent readers are protected by hazard pointers (HazardGuard): a reader installs its reference's target into a thread-local slot before the CBGR check; a freer checks all slots before recycling the memory.

Generation wraparound

Generations are 32-bit → wraparound at 4.29 billion allocations per object. At 1 ns per alloc, that's ~4.3 seconds per object — a real concern for tight loops that rapidly alloc/free.

Solution: epochs. Each thread advances a 32-bit epoch counter periodically (every 1 ms, or on signal). References carry the epoch at creation. When the allocator rolls a generation, it also advances the page's epoch; any reference with an older epoch fails the check.

With 32-bit epochs rolled at 1 kHz per thread, wraparound is effectively infinite for any practical workload.

Capability bits

The low 8 bits of the capabilities field (and the mirrored 8 bits in ThinRef.epoch_caps) encode a monotonically attenuating capability set:

BitNameMeaning
0CAP_READreads permitted (always set for a live reference)
1CAP_WRITEwrites permitted
2CAP_EXECUTEthe target is callable (function pointers)
3CAP_DELEGATEthe reference can be handed off to another context
4CAP_REVOKEthe holder can revoke outstanding copies
5CAP_BORROWEDthis is a borrow, not an owner
6CAP_MUTABLE&mut semantics (exclusive access)
7CAP_NO_ESCAPEoptimisation hint: cannot escape the function

Monotonic attenuation: capabilities can only be removed as a reference is passed around — a Database.readonly() transformation clears CAP_WRITE once and the result can never regain it. The compiler enforces this at every conversion point.

A method dispatch for Database.write(...) emits a capability check:

assert((r.capabilities & CAP_WRITE) != 0);

The check is one AND + one branch — ~1 ns, dominated by the branch predictor.

Compile-time analysis suite

The tier decisions that let the compiler emit &checked T or &unsafe T instead of the Tier 0 &T come from a battery of analyses in verum_cbgr:

ModuleRole
tier_analysis.rsunified API — composes the results below into per-reference tier decisions
escape_analysis.rsforward dataflow; four states: NoEscape, MayEscape, Escapes, Unknown
escape_categories.rsrefines Escapes with provenance data for SBGL decisions
ownership_analysis.rstracks move/borrow/consume per reference
concurrency_analysis.rsdata-race detection driven by Send/Sync
lifetime_analysis.rsclassic borrow checking
nll_analysis.rsnon-lexical lifetimes (fine-grained region inference)
polonius_analysis.rsPolonius-style origin tracking for hard cases
dominance_analysis.rsdominator-based promotion
points_to_analysis.rsAndersen-style points-to graph
smt_alias_verification.rsSMT-backed alias proofs when the others are inconclusive

The pipeline walks references through each analysis, producing a TierAnalysisResult that records decisions per RefId along with statistics used for the optimiser's tier distribution report.

Promotion rule

Only references classified as NoEscape are eligible for SBGL (stack-backed generation-less) promotion. MayEscape and Escapes stay at Tier 0. When the purely syntactic analyses are inconclusive, smt_alias_verification.rs asks the SMT solver whether two references can alias — proofs beyond "no" fall back to Tier 0 as well.

Typical promotion rate on idiomatic code: 60–95 % of &T occurrences land at Tier 1.

Performance numbers (M3 Max)

OperationCyclesNanoseconds
Unchecked deref20.5
&checked T deref20.5
&T deref (hot header)5513.8
&T deref (cold header)22055
Allocation (cached size class)4010
Free + gen++8020
Realloc (in place)6015
Realloc (moving)depends

VBC tier opcodes

Each tier is a distinct VBC instruction, so the tier decision is visible all the way down to the bytecode — the interpreter and the AOT lowering pipeline read the same opcode.

OpcodeMnemonicMeaning
0x70RefTier 0 immutable borrow
0x71RefMutTier 0 mutable borrow
0x72Derefderef with CBGR validation
0x73DerefMutmutable deref with validation
0x74ChkRefexplicit validation check (guard)
0x75RefCheckedTier 1 — compiler-proven safe, 0 ns deref
0x76RefUnsafeTier 2 — unsafe, 0 ns deref
0x77DropRefdrop a reference (bookkeeping only)

Tier behaviour across execution modes

ModeTier 0 derefTier 1 derefTier 2 deref
Interpreterfull checkfull check (safety first)full check (safety first)
AOT (LLVM, debug)full checkdirect loaddirect load
AOT (LLVM, release)full check, elided where escape analysis proves safedirect loaddirect load

The interpreter intentionally validates every deref regardless of tier — it is the safe-by-default executor. Tier 1 and Tier 2 give their 0 ns benefit under AOT, where the check is proven away statically. Verum does not ship a JIT tier: the AOT pipeline lowers VBC through LLVM (for CPU) and MLIR (for GPU), and the interpreter handles every non-AOT execution path. A speculative baseline/JIT tier was evaluated and removed as not pulling its weight given how fast the interpreter already is and how effectively the AOT pipeline elides tiered checks.

MLIR lowering

The verum.cbgr dialect mirrors the VBC opcodes:

verum.cbgr.borrow / verum.cbgr.borrow_mut
verum.cbgr.deref / verum.cbgr.deref_mut
verum.cbgr.store
verum.cbgr.drop

Lowering strategy is one of:

  • CbgrValidated — Tier 0, emits a call into stdlib/mem's validation function.
  • DirectLoad — Tier 1, lowers to llvm.load directly.
  • UncheckedLoad — Tier 2, lowers to llvm.load with no metadata decorations.

Threads and concurrency

CBGR is designed to be lock-free on the fast path:

  • Per-thread local heap avoids contention.
  • Generations are atomic loads (no fences needed on x86_64 TSO).
  • Hazard pointers replace epoch-based reclamation for finer-grained safety.

On weakly-ordered architectures (aarch64), an acquire fence on deref and a release fence on free provide the necessary ordering.

Analysis configuration knobs

CBGR ships a uniform configuration surface across every static analysis pass. Every gate listed below is honoured by the consumer code and pinned by a unit test inside the corresponding analyser module — the user-facing contract is "set the field, observe the difference".

The configs share a uniform shape:

  • Each *Config carries production-suitable defaults.
  • Where the same config has multiple stances (aggressive vs conservative vs disabled), factory constructors expose them.
  • Where a result struct exists, it mirrors the config booleans through *() accessors so diagnostic builders never need to re-read the analyser's internal state.
  • min_confidence-style threshold fields filter every warning class identically (one filter pass per detector, applied centrally).
  • max_* cap fields with default 0 mean "unlimited"; any positive value bounds the analysis cost at the price of potentially missing entries in the truncated tail (documented inline in the analyser source).

LifetimeAnalysisConfig

FieldDefaultWhat it gates
infer_lifetimestruePhase 1 (lifetime creation per reference). When false, the analyzer skips Phase 1 entirely; downstream phases see an empty ref_lifetimes map. Used by embedders importing lifetime IDs from an upstream IR.
check_outlivestruePhase 4 (constraint solving). When false, the constraint vector is still surfaced (callers driving their own solver consume it) but violations stays empty and stats.constraints_solved == 0.
detailed_diagnosticstrueSurfaced via LifetimeAnalysisResult.detailed_diagnostics(). Diagnostic builders consult it to choose between verbose (span / region / chain detail) and compact (one-line summary) rendering.
max_iterations1000Upper bound for the liveness fixed-point loop. Prevents pathological CFGs from running unbounded analysis time.

NllConfig

FieldDefaultWhat it gates
two_phase_borrowstrueSurfaced via NllAnalysisResult.two_phase_borrows_enabled(). Diagnostic builders consult this when explaining a single-phase borrow rejection — when false, the message can suggest enabling two-phase as the fix; when true, the rejection is structural.
polonius_modefalseSurfaced via NllAnalysisResult.polonius_mode(). Tags the result for downstream Polonius origin-tracking consumption.
max_iterations1000Upper bound for both the liveness and constraint-solver fixed-point loops.
detailed_diagnosticstrueSurfaced via NllAnalysisResult.detailed_diagnostics() for verbose vs compact rendering.

OwnershipAnalysisConfig

FieldDefaultWhat it gates
track_stackfalseWhen false, AllocationKind::Stack allocations are dropped from the analysis (stack lifetime is structural, not heap-style).
track_temporariesfalseSame shape as track_stack for AllocationKind::Temporary (expression-result allocations). Embedders that need to track them opt in.
detect_leakstrueLeak-detection pass. When false, the leak warning list is empty regardless of analysis state.
min_confidence0.5Filters every warning class (double-free, use-after-free, leak) whose confidence is below the threshold.
max_blocks0 (unlimited)Caps the CFG block walk in extract_allocation_sites. Trade-off: bounded analysis cost at the price of potentially missing allocations in the truncated tail.

PoloniusConfig

FieldDefaultWhat it gates
max_iterations1000Upper bound for the Datalog-style fixed-point loop in compute_output_facts.
location_sensitivetrueSurfaced via PoloniusAnalysisResult.is_location_sensitive(). Diagnostic builders consult this when explaining a loan rejection — when false, the message can suggest enabling location-sensitive analysis as the fix; when true, the rejection is structural.
check_movestrueSurfaced via PoloniusAnalysisResult.check_moves_enabled(). Downstream consumers that wrap the analyser with MoveTracker integration check this flag before consulting the tracker — false suppresses the move-error class from the diagnostic surface.

EscapeAnalysisConfig

FieldDefaultWhat it gates
enable_interproceduraltrueCross-function escape analysis. Disabling falls back to per-function conservative results.
max_iterations100Upper bound for the interprocedural fixed-point loop.

AbstractionConfig (path-sensitive predicate abstraction)

FieldDefaultWhat it gates
max_abstraction_level4Caps the abstraction level requested by abstract_predicate. Higher → more aggressive (less precise).
path_threshold50merge_similar_paths skips work when path count is below this.
use_smt_equivalencetrueGates check_equivalence_smt. When false, the method returns the conservative "not provably equivalent" answer without invoking the SMT layer (stats counter still increments). Surfaced via smt_equivalence_enabled().
use_subsumptiontrueGates check_subsumption_smt symmetrically. Surfaced via subsumption_enabled().
use_wideningtrueMaster switch over abstract_level3. When false, level-3 falls back to level-2 (no widening) — preserves precision at the cost of slower convergence on deeply-nested loops. Surfaced via widening_enabled().
widening_threshold3When widening is enabled: widen after this many iterations on the same predicate hash.
max_cache_size10000Caps the abstraction / equivalence / subsumption caches to bound memory growth on long runs.
incremental_mergingtrueSurfaced via incremental_merging_enabled(). External enumerators that produce path lists consult this to decide whether to fold paths into the merger as they're produced (incremental) or accumulate the full list and merge once at the end.

PromotionConfig

FieldDefaultWhat it gates
enable_promotiontrueMaster switch. When false, every reference falls to KeepManagedConservative regardless of analysis.
extra_conservativefalseAdds the async/await-crossing + exception-path safety checks to the promotion verdict. aggressive() clears it; conservative() sets it.
allow_heap_promotionfalseWhen false (default), a reference with is_stack_allocated == false falls back to KeepManagedConservative even when dominance + escape analysis approves. The safe default — heap-rooted promotions need additional liveness analysis the analyzer hasn't produced for that reference. PromotionConfig::aggressive() flips it to true.
confidence_threshold0.95Reserved for the future confidence-tagged decision path. Currently informational.

Convenience constructors:

ConstructorEffect
default()Enabled with the safe heap default (no heap promotion).
aggressive()Enabled, allows heap promotion, lower confidence threshold (0.80).
conservative()Enabled with extra_conservative set and a tight confidence threshold (0.99).
disabled()Master switch off — every reference stays managed.

ConcurrencyAnalysisConfig

FieldDefaultWhat it gates
detect_data_racestrueRace-detection pass. false returns an empty data-race warning list.
detect_deadlockstrueLock-cycle detection. false returns an empty deadlock warning list.
check_thread_safetytrueSend/Sync surface scan.
min_confidence0.5Filters every warning whose confidence is below the threshold.
max_accesses0 (unlimited)Caps the analyzer's memory-access list. Once the cap is reached, extract_accesses returns early.

EnhancedContextConfig (context-sensitive analysis)

FieldDefaultWhat it gates
flow_sensitivefalseSurfaced via ContextSensitiveAnalyzer.flow_sensitive_enabled(). Downstream analyses that maintain per-flow state consult this to decide whether to install or skip the flow-state map.
adaptive_depthfalseWhen true, with_config installs an AdaptiveDepthPolicy::new(default_depth, max_depth) so the dependent slot is in play. Surfaced via adaptive_depth_enabled().
compressionfalseWhen true, with_config installs a fresh ContextCompressor if none is set; pre-existing custom compressors are preserved. Surfaced via compression_enabled().
default_depth3Default analysis depth used when adaptive_depth == false. Surfaced via default_depth().
max_depth10Hard upper bound on context depth (safety check). Surfaced via max_depth().

The convenience constructor EnhancedContextConfig::all_enabled() flips the three booleans to true and keeps the depth defaults (3 / 10). Calling with_config(EnhancedContextConfig::all_enabled()) on a fresh ContextSensitiveAnalyzer::new() now produces the same effective shape as the bespoke with_all_enhancements() constructor — depth_policy + compressor are auto-installed.

ParallelConfig (context-sensitive parallel analysis)

FieldDefaultWhat it gates
threshold10should_parallelize returns true only when the input size meets this. Below the threshold, analyze_parallel runs sequentially.
max_threads0 (rayon default)When > 0, analyze_parallel installs a bespoke rayon::ThreadPool capped to that many workers for the duration of the call. Pool-build failures fall back to the global pool.
work_stealingtrueDocumented stance. Rayon's underlying scheduler is always work-stealing; the accessor work_stealing_enabled() exists for diagnostics that surface the declared stance.

ValueTrackingConfig (concrete-value tracking for escape refinement)

FieldDefaultWhat it gates
enable_constant_propagationtrueConcrete-value writes in propagate_constant, the constant fast-path inside propagate_binop, and the merge-into-concrete path in propagate_phi. When false, no concrete state is recorded; downstream constant folding sees nothing.
enable_range_analysistrueRange-refinement branch inside propagate_binop. When false, the range computation is skipped entirely.
enable_symbolic_executiontrueSymbolic-mirror writes in propagate_constant, propagate_binop, and propagate_phi. When false, no SymbolicValue is constructed.
max_iterations100Worklist iteration cap in track_concrete_values_with_config. Bounds analysis cost on pathological CFGs; once exceeded, propagation stops with whatever block states have been recorded so far (best-effort partial result).

The three domain gates are independent — opting out of constant tracking keeps symbolic and range tracking active (and vice-versa) so callers can build a precise per-analysis cost / precision profile. The configuration is passed when constructing a ValuePropagator or driving the enhanced escape analyser; both honour the same ValueTrackingConfig shape and propagate the gates uniformly through their worklist.

DiagnosticsConfig

FieldDefaultWhat it gates
file_name<source>Span source filename used in rendered diagnostics.
include_tier_reasonstrueAttach the Tier 0 reason (escape kind, dominance status, etc.) as a note.
confidence_threshold0.7Suppress warnings whose confidence is below the threshold.
include_helptrueAttach the canonical fix-it suggestion.
include_doc_urlstrueAttach the verum-lang.org/errors/<code> URL.

Wire-up tests (proof of effect)

Every gate listed above has a corresponding unit test inside its analyser module that asserts the field's documented effect:

ModuleWire-up tests
lifetime_analysisinfer_lifetimes_false_skips_phase_1, check_outlives_false_returns_empty_violation_list, detailed_diagnostics_flows_to_result
nll_analysisnll_config_flows_to_result (8-cell truth table over the three booleans)
ownership_analysismin_confidence_filters_low_confidence_warnings, max_blocks_caps_extraction_walk
concurrency_analysismin_confidence_filters_low_confidence_warnings, max_accesses_caps_extraction
polonius_analysispolonius_config_flows_to_result (4-cell truth table over the two booleans)
predicate_abstractionconfig_accessors_mirror_construction_values (16-cell truth table over the four booleans, in tests/predicate_abstraction_integration_tests.rs)
context_enhancementsenhanced_context_config_drives_dependencies, parallel_config_max_threads_round_trips_through_accessor
promotion_decisionallow_heap_promotion_default_vs_aggressive

Run the inventory locally with:

cargo test -p verum_cbgr --lib analysis

See also