Skip to main content

CBGR — Capability-Based Generational References

CBGR is Verum's default memory-safety mechanism for &T references. It detects use-after-free and double-free at runtime with roughly 15 nanoseconds of overhead per dereference — faster than a malloc, slower than a register access.

This page explains the idea. For data-structure details, see CBGR internals.

The problem

Manual memory management crashes when you dereference a pointer after its object has been freed. Garbage collection solves this by refusing to free until no one can reach the object; the cost is latency spikes and loss of control.

Verum wants the control and the safety. CBGR is the compromise.

The idea in one paragraph

Every heap allocation carries a small header with a generation counter. Every reference includes a copy of the generation it was issued against. When you dereference, the runtime compares the two. If they match, the object is still the one you got a reference to, and the access proceeds. If they differ, the object has been freed (or revoked) and the access is rejected.

What a reference looks like

ThinRef<T> — 16 bytes:

OffsetSizeFieldPurpose
08 Bpointerobject address
84 Bgenerationissued-against counter
124 Bepoch / capsscope epoch + capability bit vector

For unsized types, slices, trait objects, and interior references the runtime uses FatRef<T> — 32 bytes total. It layers three fields on top of a ThinRef: an 8-byte metadata word (length for slices, vtable pointer for dyn), a 4-byte offset (for interior references), and a 4-byte reserved field for alignment and future use.

What a header looks like

AllocationHeader — 32 bytes, cache-line (32-byte) aligned, placed immediately before the object payload:

OffsetSizeFieldPurpose
0u32sizepayload size in bytes
4u32alignpayload alignment
8u32generationbumped on free/revoke
12u16epochscope epoch
14u16capabilitiescapability bits
16u32type_idruntime type identifier
20u32flagsmark/pin/frozen bits
24u64reservedreserved for future use

generation (u32) and epoch (u16) are laid out so they fit into a single 64-bit atomic load on the fast path; freeing an object Release-increments that word, and every reader does an Acquire load before comparing. See architecture → CBGR internals for the exact bit layout.

The check

fn deref(r: ThinRef<T>) -> &T {
let hdr = header_of(r.pointer);
if hdr.generation != r.generation {
panic_use_after_free();
}
unsafe { &*r.pointer }
}

Three loads, one compare, one conditional branch. On typical hardware, this measures ~15 ns.

Why not just bounds-check?

Bounds checking prevents out-of-range indexing; it does nothing about stale pointers after free. Conversely, CBGR prevents stale-pointer access but does not itself bound-check indices. They are orthogonal safety mechanisms — and Verum uses both.

Generation wraparound

Generations are 32-bit. At one allocation per object per nanosecond, wraparound takes ~4.3 seconds. To prevent reuse of a generation while old references still point at it, the allocator uses epochs: a thread-local epoch counter advances periodically, and references are invalidated across epoch boundaries. This is handled automatically by the runtime.

Capability bits

The epoch / caps word of a reference is partitioned between the epoch identity and eight capability bits, drawn from a fixed set with monotonic attenuation (capabilities can only be removed as the reference is passed around):

BitNameMeaning
0READreads permitted (set for every live reference)
1WRITEwrites permitted
2EXECUTEthe target is callable
3DELEGATEcan be handed to another context
4REVOKEthe holder can revoke outstanding copies
5BORROWEDthis is a borrow, not an owner
6MUTABLE&mut semantics (exclusive access)
7NO_ESCAPEoptimisation hint — cannot escape

This is how Database with [READ] becomes a value at runtime — the Database reference has WRITE cleared, and a call to Database.write(...) fails a capability check that is one AND plus one branch (~1 ns). Reducing the set (db.readonly()) is always allowed; re-expanding it is rejected by the compiler.

When the check is elided

The compiler emits the full ~15 ns check for &T. It emits nothing for &checked T — escape analysis (one of eleven compile-time analyses in verum_cbgr) has proved the check unnecessary. The proof is witnessed in the compilation artefacts; you can inspect which references got promoted with:

$ verum analyze --escape ./src/main.vr
function total tier0 tier1 tier2 promoted
process 42 3 39 0 39/42 (92.9%)
tight_loop 8 0 8 0 8/8 (100%)

Or dump the full analysis suite with verum analyze --all. On idiomatic code the typical promotion rate is 60–95 %.

Tiered execution

In the VBC interpreter, CBGR checks run in software. In the LLVM AOT backend, they are lowered to native instructions and frequently collapsed by LLVM's optimiser when adjacent to each other or inside tight loops. In GPU kernels, CBGR is disabled by construction (kernels operate on a separate memory arena with statically checked accesses).

Performance numbers

Reported on an Apple M3 Max, release build with LTO:

OperationCyclesNanoseconds
Unchecked pointer deref20.5
&checked T deref20.5
&T CBGR check + deref5513.8
&T check + cache miss on header22055
Free + increment generation8020

The "cache miss" line is worst case — the header is designed to share a cache line with the object, so in typical access patterns it's already hot.

Mental model

Think of CBGR as trading a small constant-factor overhead on every reference dereference for the complete elimination of an entire class of CVEs. For most code, 15 ns is invisible. For hot loops, escape analysis elides the check. For code where it cannot, you can be explicit about wanting &checked T and let the compiler tell you what needs refactoring.

See also