CBGR Internals
This page is for compiler developers and anyone curious about the mechanics. For the user-facing view, see Language → CBGR.
Data structures
AllocationHeader
32 bytes, prepended to every CBGR-tracked allocation, 32-byte aligned (one full cache line on x86_64 / aarch64).
// stdlib/mem/header.vr, @repr(C, align(32))
type AllocationHeader is {
uint32_t size; // 4 B — payload size
uint32_t alignment; // 4 B — payload alignment
uint32_t generation; // 4 B — CBGR counter, bumped on free
uint16_t epoch; // 2 B — wraparound-safety counter
uint16_t capabilities; // 2 B — 8 capability flags
uint32_t type_id; // 4 B — runtime type info
uint32_t flags; // 4 B — drop impl, pinned, etc.
uint32_t reserved[2]; // 8 B — reserved for future use
};
generation + epoch are laid out so they can be loaded in a single
64-bit atomic — load_generation_epoch_combined() does exactly that
on the fast path with Acquire ordering. Free uses Release on the
same word so reader visibility is guaranteed without a full fence.
ThinRef<T>
16 bytes. Used for sized T.
type ThinRef is<T> {
T* ptr;
uint32_t generation;
uint32_t epoch_caps; // 16 bits epoch + 16 bits capabilities
};
FatRef<T>
32 bytes. Used when T is unsized (slices, dyn) or for interior
references that need an offset into a larger allocation.
type FatRef is<T> {
T* ptr; // 8 B
uint32_t generation; // 4 B
uint32_t epoch_caps; // 4 B — epoch:16 | caps:16
uint64_t metadata; // 8 B — slice length, dyn vtable, …
uint32_t offset; // 4 B — non-zero for interior refs
uint32_t reserved; // 4 B — alignment + room to grow
};
Check sequence
T* deref(ThinRef<T> r) {
Header* h = (Header*)((uint8_t*)r.ptr - sizeof(Header));
if (__builtin_expect(h->generation != r.generation, 0)) {
handle_use_after_free(&r, h);
}
return r.ptr;
}
On x86_64 this compiles to:
mov rax, [rdi] ; load pointer
mov ecx, [rax - 16] ; load header.generation
cmp ecx, [rdi + 8] ; compare with ref.generation
jne .uaf_handler
ret
Measured: 55 cycles, ~13.8 ns on an M3 Max.
Allocator
Based on Mimalloc's segment architecture:
- Segment: 32 MiB virtual region, lazily committed.
- Page: subdivision of a segment; each page holds one size class.
- Size class: 73 classes, spaced at 12.5% intervals, from 8 bytes to 4 MiB.
- LocalHeap: per-thread cache; lock-free fast path.
- Free list: per-page, reversed on full page to maintain LIFO for cache friendliness.
Freeing an object:
- Atomically increments
header.generation. - Pushes the slot onto the page's free list.
Concurrent readers are protected by hazard pointers (HazardGuard):
a reader installs its reference's target into a thread-local slot
before the CBGR check; a freer checks all slots before recycling the
memory.
Generation wraparound
Generations are 32-bit → wraparound at 4.29 billion allocations per object. At 1 ns per alloc, that's ~4.3 seconds per object — a real concern for tight loops that rapidly alloc/free.
Solution: epochs. Each thread advances a 32-bit epoch counter periodically (every 1 ms, or on signal). References carry the epoch at creation. When the allocator rolls a generation, it also advances the page's epoch; any reference with an older epoch fails the check.
With 32-bit epochs rolled at 1 kHz per thread, wraparound is effectively infinite for any practical workload.
Capability bits
The low 8 bits of the capabilities field (and the mirrored 8 bits in
ThinRef.epoch_caps) encode a monotonically attenuating capability
set:
| Bit | Name | Meaning |
|---|---|---|
| 0 | CAP_READ | reads permitted (always set for a live reference) |
| 1 | CAP_WRITE | writes permitted |
| 2 | CAP_EXECUTE | the target is callable (function pointers) |
| 3 | CAP_DELEGATE | the reference can be handed off to another context |
| 4 | CAP_REVOKE | the holder can revoke outstanding copies |
| 5 | CAP_BORROWED | this is a borrow, not an owner |
| 6 | CAP_MUTABLE | &mut semantics (exclusive access) |
| 7 | CAP_NO_ESCAPE | optimisation hint: cannot escape the function |
Monotonic attenuation: capabilities can only be removed as a
reference is passed around — a Database.readonly() transformation
clears CAP_WRITE once and the result can never regain it. The
compiler enforces this at every conversion point.
A method dispatch for Database.write(...) emits a capability check:
assert((r.capabilities & CAP_WRITE) != 0);
The check is one AND + one branch — ~1 ns, dominated by the branch predictor.
Compile-time analysis suite
The tier decisions that let the compiler emit &checked T or
&unsafe T instead of the Tier 0 &T come from a battery of
analyses in verum_cbgr:
| Module | Role |
|---|---|
tier_analysis.rs | unified API — composes the results below into per-reference tier decisions |
escape_analysis.rs | forward dataflow; four states: NoEscape, MayEscape, Escapes, Unknown |
escape_categories.rs | refines Escapes with provenance data for SBGL decisions |
ownership_analysis.rs | tracks move/borrow/consume per reference |
concurrency_analysis.rs | data-race detection driven by Send/Sync |
lifetime_analysis.rs | classic borrow checking |
nll_analysis.rs | non-lexical lifetimes (fine-grained region inference) |
polonius_analysis.rs | Polonius-style origin tracking for hard cases |
dominance_analysis.rs | dominator-based promotion |
points_to_analysis.rs | Andersen-style points-to graph |
smt_alias_verification.rs | SMT-backed alias proofs when the others are inconclusive |
The pipeline walks references through each analysis, producing a
TierAnalysisResult that records decisions per RefId along with
statistics used for the optimiser's tier distribution report.
Promotion rule
Only references classified as NoEscape are eligible for SBGL
(stack-backed generation-less) promotion. MayEscape and Escapes
stay at Tier 0. When the purely syntactic analyses are inconclusive,
smt_alias_verification.rs asks the SMT solver whether two references can alias
— proofs beyond "no" fall back to Tier 0 as well.
Typical promotion rate on idiomatic code: 60–95 % of &T occurrences
land at Tier 1.
Performance numbers (M3 Max)
| Operation | Cycles | Nanoseconds |
|---|---|---|
| Unchecked deref | 2 | 0.5 |
&checked T deref | 2 | 0.5 |
&T deref (hot header) | 55 | 13.8 |
&T deref (cold header) | 220 | 55 |
| Allocation (cached size class) | 40 | 10 |
| Free + gen++ | 80 | 20 |
| Realloc (in place) | 60 | 15 |
| Realloc (moving) | depends | – |
VBC tier opcodes
Each tier is a distinct VBC instruction, so the tier decision is visible all the way down to the bytecode — the interpreter and the AOT lowering pipeline read the same opcode.
| Opcode | Mnemonic | Meaning |
|---|---|---|
| 0x70 | Ref | Tier 0 immutable borrow |
| 0x71 | RefMut | Tier 0 mutable borrow |
| 0x72 | Deref | deref with CBGR validation |
| 0x73 | DerefMut | mutable deref with validation |
| 0x74 | ChkRef | explicit validation check (guard) |
| 0x75 | RefChecked | Tier 1 — compiler-proven safe, 0 ns deref |
| 0x76 | RefUnsafe | Tier 2 — unsafe, 0 ns deref |
| 0x77 | DropRef | drop a reference (bookkeeping only) |
Tier behaviour across execution modes
| Mode | Tier 0 deref | Tier 1 deref | Tier 2 deref |
|---|---|---|---|
| Interpreter | full check | full check (safety first) | full check (safety first) |
| AOT (LLVM, debug) | full check | direct load | direct load |
| AOT (LLVM, release) | full check, elided where escape analysis proves safe | direct load | direct load |
The interpreter intentionally validates every deref regardless of tier — it is the safe-by-default executor. Tier 1 and Tier 2 give their 0 ns benefit under AOT, where the check is proven away statically. Verum does not ship a JIT tier: the AOT pipeline lowers VBC through LLVM (for CPU) and MLIR (for GPU), and the interpreter handles every non-AOT execution path. A speculative baseline/JIT tier was evaluated and removed as not pulling its weight given how fast the interpreter already is and how effectively the AOT pipeline elides tiered checks.
MLIR lowering
The verum.cbgr dialect mirrors the VBC opcodes:
verum.cbgr.borrow / verum.cbgr.borrow_mut
verum.cbgr.deref / verum.cbgr.deref_mut
verum.cbgr.store
verum.cbgr.drop
Lowering strategy is one of:
CbgrValidated— Tier 0, emits a call intostdlib/mem's validation function.DirectLoad— Tier 1, lowers tollvm.loaddirectly.UncheckedLoad— Tier 2, lowers tollvm.loadwith no metadata decorations.
Threads and concurrency
CBGR is designed to be lock-free on the fast path:
- Per-thread local heap avoids contention.
- Generations are atomic loads (no fences needed on x86_64 TSO).
- Hazard pointers replace epoch-based reclamation for finer-grained safety.
On weakly-ordered architectures (aarch64), an acquire fence on deref and a release fence on free provide the necessary ordering.
Analysis configuration knobs
CBGR ships a uniform configuration surface across every static analysis pass. Every gate listed below is honoured by the consumer code and pinned by a unit test inside the corresponding analyser module — the user-facing contract is "set the field, observe the difference".
The configs share a uniform shape:
- Each
*Configcarries production-suitable defaults. - Where the same config has multiple stances (aggressive vs conservative vs disabled), factory constructors expose them.
- Where a result struct exists, it mirrors the config booleans
through
*()accessors so diagnostic builders never need to re-read the analyser's internal state. min_confidence-style threshold fields filter every warning class identically (one filter pass per detector, applied centrally).max_*cap fields with default0mean "unlimited"; any positive value bounds the analysis cost at the price of potentially missing entries in the truncated tail (documented inline in the analyser source).
LifetimeAnalysisConfig
| Field | Default | What it gates |
|---|---|---|
infer_lifetimes | true | Phase 1 (lifetime creation per reference). When false, the analyzer skips Phase 1 entirely; downstream phases see an empty ref_lifetimes map. Used by embedders importing lifetime IDs from an upstream IR. |
check_outlives | true | Phase 4 (constraint solving). When false, the constraint vector is still surfaced (callers driving their own solver consume it) but violations stays empty and stats.constraints_solved == 0. |
detailed_diagnostics | true | Surfaced via LifetimeAnalysisResult.detailed_diagnostics(). Diagnostic builders consult it to choose between verbose (span / region / chain detail) and compact (one-line summary) rendering. |
max_iterations | 1000 | Upper bound for the liveness fixed-point loop. Prevents pathological CFGs from running unbounded analysis time. |
NllConfig
| Field | Default | What it gates |
|---|---|---|
two_phase_borrows | true | Surfaced via NllAnalysisResult.two_phase_borrows_enabled(). Diagnostic builders consult this when explaining a single-phase borrow rejection — when false, the message can suggest enabling two-phase as the fix; when true, the rejection is structural. |
polonius_mode | false | Surfaced via NllAnalysisResult.polonius_mode(). Tags the result for downstream Polonius origin-tracking consumption. |
max_iterations | 1000 | Upper bound for both the liveness and constraint-solver fixed-point loops. |
detailed_diagnostics | true | Surfaced via NllAnalysisResult.detailed_diagnostics() for verbose vs compact rendering. |
OwnershipAnalysisConfig
| Field | Default | What it gates |
|---|---|---|
track_stack | false | When false, AllocationKind::Stack allocations are dropped from the analysis (stack lifetime is structural, not heap-style). |
track_temporaries | false | Same shape as track_stack for AllocationKind::Temporary (expression-result allocations). Embedders that need to track them opt in. |
detect_leaks | true | Leak-detection pass. When false, the leak warning list is empty regardless of analysis state. |
min_confidence | 0.5 | Filters every warning class (double-free, use-after-free, leak) whose confidence is below the threshold. |
max_blocks | 0 (unlimited) | Caps the CFG block walk in extract_allocation_sites. Trade-off: bounded analysis cost at the price of potentially missing allocations in the truncated tail. |
PoloniusConfig
| Field | Default | What it gates |
|---|---|---|
max_iterations | 1000 | Upper bound for the Datalog-style fixed-point loop in compute_output_facts. |
location_sensitive | true | Surfaced via PoloniusAnalysisResult.is_location_sensitive(). Diagnostic builders consult this when explaining a loan rejection — when false, the message can suggest enabling location-sensitive analysis as the fix; when true, the rejection is structural. |
check_moves | true | Surfaced via PoloniusAnalysisResult.check_moves_enabled(). Downstream consumers that wrap the analyser with MoveTracker integration check this flag before consulting the tracker — false suppresses the move-error class from the diagnostic surface. |
EscapeAnalysisConfig
| Field | Default | What it gates |
|---|---|---|
enable_interprocedural | true | Cross-function escape analysis. Disabling falls back to per-function conservative results. |
max_iterations | 100 | Upper bound for the interprocedural fixed-point loop. |
AbstractionConfig (path-sensitive predicate abstraction)
| Field | Default | What it gates |
|---|---|---|
max_abstraction_level | 4 | Caps the abstraction level requested by abstract_predicate. Higher → more aggressive (less precise). |
path_threshold | 50 | merge_similar_paths skips work when path count is below this. |
use_smt_equivalence | true | Gates check_equivalence_smt. When false, the method returns the conservative "not provably equivalent" answer without invoking the SMT layer (stats counter still increments). Surfaced via smt_equivalence_enabled(). |
use_subsumption | true | Gates check_subsumption_smt symmetrically. Surfaced via subsumption_enabled(). |
use_widening | true | Master switch over abstract_level3. When false, level-3 falls back to level-2 (no widening) — preserves precision at the cost of slower convergence on deeply-nested loops. Surfaced via widening_enabled(). |
widening_threshold | 3 | When widening is enabled: widen after this many iterations on the same predicate hash. |
max_cache_size | 10000 | Caps the abstraction / equivalence / subsumption caches to bound memory growth on long runs. |
incremental_merging | true | Surfaced via incremental_merging_enabled(). External enumerators that produce path lists consult this to decide whether to fold paths into the merger as they're produced (incremental) or accumulate the full list and merge once at the end. |
PromotionConfig
| Field | Default | What it gates |
|---|---|---|
enable_promotion | true | Master switch. When false, every reference falls to KeepManagedConservative regardless of analysis. |
extra_conservative | false | Adds the async/await-crossing + exception-path safety checks to the promotion verdict. aggressive() clears it; conservative() sets it. |
allow_heap_promotion | false | When false (default), a reference with is_stack_allocated == false falls back to KeepManagedConservative even when dominance + escape analysis approves. The safe default — heap-rooted promotions need additional liveness analysis the analyzer hasn't produced for that reference. PromotionConfig::aggressive() flips it to true. |
confidence_threshold | 0.95 | Reserved for the future confidence-tagged decision path. Currently informational. |
Convenience constructors:
| Constructor | Effect |
|---|---|
default() | Enabled with the safe heap default (no heap promotion). |
aggressive() | Enabled, allows heap promotion, lower confidence threshold (0.80). |
conservative() | Enabled with extra_conservative set and a tight confidence threshold (0.99). |
disabled() | Master switch off — every reference stays managed. |
ConcurrencyAnalysisConfig
| Field | Default | What it gates |
|---|---|---|
detect_data_races | true | Race-detection pass. false returns an empty data-race warning list. |
detect_deadlocks | true | Lock-cycle detection. false returns an empty deadlock warning list. |
check_thread_safety | true | Send/Sync surface scan. |
min_confidence | 0.5 | Filters every warning whose confidence is below the threshold. |
max_accesses | 0 (unlimited) | Caps the analyzer's memory-access list. Once the cap is reached, extract_accesses returns early. |
EnhancedContextConfig (context-sensitive analysis)
| Field | Default | What it gates |
|---|---|---|
flow_sensitive | false | Surfaced via ContextSensitiveAnalyzer.flow_sensitive_enabled(). Downstream analyses that maintain per-flow state consult this to decide whether to install or skip the flow-state map. |
adaptive_depth | false | When true, with_config installs an AdaptiveDepthPolicy::new(default_depth, max_depth) so the dependent slot is in play. Surfaced via adaptive_depth_enabled(). |
compression | false | When true, with_config installs a fresh ContextCompressor if none is set; pre-existing custom compressors are preserved. Surfaced via compression_enabled(). |
default_depth | 3 | Default analysis depth used when adaptive_depth == false. Surfaced via default_depth(). |
max_depth | 10 | Hard upper bound on context depth (safety check). Surfaced via max_depth(). |
The convenience constructor EnhancedContextConfig::all_enabled()
flips the three booleans to true and keeps the depth defaults
(3 / 10). Calling with_config(EnhancedContextConfig::all_enabled())
on a fresh ContextSensitiveAnalyzer::new() now produces the
same effective shape as the bespoke with_all_enhancements()
constructor — depth_policy + compressor are auto-installed.
ParallelConfig (context-sensitive parallel analysis)
| Field | Default | What it gates |
|---|---|---|
threshold | 10 | should_parallelize returns true only when the input size meets this. Below the threshold, analyze_parallel runs sequentially. |
max_threads | 0 (rayon default) | When > 0, analyze_parallel installs a bespoke rayon::ThreadPool capped to that many workers for the duration of the call. Pool-build failures fall back to the global pool. |
work_stealing | true | Documented stance. Rayon's underlying scheduler is always work-stealing; the accessor work_stealing_enabled() exists for diagnostics that surface the declared stance. |
ValueTrackingConfig (concrete-value tracking for escape refinement)
| Field | Default | What it gates |
|---|---|---|
enable_constant_propagation | true | Concrete-value writes in propagate_constant, the constant fast-path inside propagate_binop, and the merge-into-concrete path in propagate_phi. When false, no concrete state is recorded; downstream constant folding sees nothing. |
enable_range_analysis | true | Range-refinement branch inside propagate_binop. When false, the range computation is skipped entirely. |
enable_symbolic_execution | true | Symbolic-mirror writes in propagate_constant, propagate_binop, and propagate_phi. When false, no SymbolicValue is constructed. |
max_iterations | 100 | Worklist iteration cap in track_concrete_values_with_config. Bounds analysis cost on pathological CFGs; once exceeded, propagation stops with whatever block states have been recorded so far (best-effort partial result). |
The three domain gates are independent — opting out of constant
tracking keeps symbolic and range tracking active (and vice-versa)
so callers can build a precise per-analysis cost / precision
profile. The configuration is passed when constructing a
ValuePropagator or driving the enhanced escape analyser; both
honour the same ValueTrackingConfig shape and propagate the
gates uniformly through their worklist.
DiagnosticsConfig
| Field | Default | What it gates |
|---|---|---|
file_name | <source> | Span source filename used in rendered diagnostics. |
include_tier_reasons | true | Attach the Tier 0 reason (escape kind, dominance status, etc.) as a note. |
confidence_threshold | 0.7 | Suppress warnings whose confidence is below the threshold. |
include_help | true | Attach the canonical fix-it suggestion. |
include_doc_urls | true | Attach the verum-lang.org/errors/<code> URL. |
Wire-up tests (proof of effect)
Every gate listed above has a corresponding unit test inside its analyser module that asserts the field's documented effect:
| Module | Wire-up tests |
|---|---|
lifetime_analysis | infer_lifetimes_false_skips_phase_1, check_outlives_false_returns_empty_violation_list, detailed_diagnostics_flows_to_result |
nll_analysis | nll_config_flows_to_result (8-cell truth table over the three booleans) |
ownership_analysis | min_confidence_filters_low_confidence_warnings, max_blocks_caps_extraction_walk |
concurrency_analysis | min_confidence_filters_low_confidence_warnings, max_accesses_caps_extraction |
polonius_analysis | polonius_config_flows_to_result (4-cell truth table over the two booleans) |
predicate_abstraction | config_accessors_mirror_construction_values (16-cell truth table over the four booleans, in tests/predicate_abstraction_integration_tests.rs) |
context_enhancements | enhanced_context_config_drives_dependencies, parallel_config_max_threads_round_trips_through_accessor |
promotion_decision | allow_heap_promotion_default_vs_aggressive |
Run the inventory locally with:
cargo test -p verum_cbgr --lib analysis
See also
- Stdlib → mem — user-facing API.
- Language → CBGR — programmer's model.
- Runtime tiers — CBGR across interpreter / AOT.