CBGR — Capability-Based Generational References

CBGR is Verum's default memory-safety mechanism for &T references. It detects use-after-free and double-free at runtime with roughly 15 nanoseconds of overhead per dereference — faster than a malloc, slower than a register access.

This page explains the idea. For data-structure details, see CBGR internals.

The problem

Manual memory management crashes when you dereference a pointer after its object has been freed. Garbage collection solves this by refusing to free until no one can reach the object; the cost is latency spikes and loss of control.

Verum wants the control and the safety. CBGR is the compromise.

The idea in one paragraph

Every heap allocation carries a small header with a generation counter. Every reference includes a copy of the generation it was issued against. When you dereference, the runtime compares the two. If they match, the object is still the one you got a reference to, and the access proceeds. If they differ, the object has been freed (or revoked) and the access is rejected.

What a reference looks like

ThinRef<T> — 16 bytes:

Offset	Size	Field	Purpose
0	8 B	`pointer`	object address
8	4 B	`generation`	issued-against counter
12	4 B	`epoch / caps`	scope epoch + capability bit vector

For unsized types, slices, trait objects, and interior references the runtime uses FatRef<T> — 32 bytes total. It layers three fields on top of a ThinRef: an 8-byte metadata word (length for slices, vtable pointer for dyn), a 4-byte offset (for interior references), and a 4-byte reserved field for alignment and future use.

What a header looks like

AllocationHeader — 32 bytes, cache-line (32-byte) aligned, placed immediately before the object payload:

Offset	Size	Field	Purpose
0	`u32`	`size`	payload size in bytes
4	`u32`	`align`	payload alignment
8	`u32`	`generation`	bumped on free/revoke
12	`u16`	`epoch`	scope epoch
14	`u16`	`capabilities`	capability bits
16	`u32`	`type_id`	runtime type identifier
20	`u32`	`flags`	mark/pin/frozen bits
24	`u64`	`reserved`	reserved for future use

generation (u32) and epoch (u16) are laid out so they fit into a single 64-bit atomic load on the fast path; freeing an object Release-increments that word, and every reader does an Acquire load before comparing. See architecture → CBGR internals for the exact bit layout.

The check

fn deref(r: ThinRef<T>) -> &T {
    let hdr = header_of(r.pointer);
    if hdr.generation != r.generation {
        panic_use_after_free();
    }
    unsafe { &*r.pointer }
}

Three loads, one compare, one conditional branch. On the production_targets bench (x86_64, release build) this measures ~0.93 ns — well under the ≤ 15 ns design target set in docs/detailed/cbgr-implementation.

Why not just bounds-check?

Bounds checking prevents out-of-range indexing; it does nothing about stale pointers after free. Conversely, CBGR prevents stale-pointer access but does not itself bound-check indices. They are orthogonal safety mechanisms — and Verum uses both.

Generation wraparound

Generations are 32-bit. At one allocation per object per nanosecond, wraparound takes ~4.3 seconds. To prevent reuse of a generation while old references still point at it, the allocator uses epochs: a thread-local epoch counter advances periodically, and references are invalidated across epoch boundaries. This is handled automatically by the runtime.

Capability bits

The epoch / caps word of a reference is partitioned between the epoch identity and eight capability bits, drawn from a fixed set with monotonic attenuation (capabilities can only be removed as the reference is passed around):

Bit	Name	Meaning
0	`READ`	reads permitted (set for every live reference)
1	`WRITE`	writes permitted
2	`EXECUTE`	the target is callable
3	`DELEGATE`	can be handed to another context
4	`REVOKE`	the holder can revoke outstanding copies
5	`BORROWED`	this is a borrow, not an owner
6	`MUTABLE`	`&mut` semantics (exclusive access)
7	`NO_ESCAPE`	optimisation hint — cannot escape

This is how Database with [READ] becomes a value at runtime — the Database reference has WRITE cleared, and a call to Database.write(...) fails a capability check that is one AND plus one branch (~1 ns). Reducing the set (db.readonly()) is always allowed; re-expanding it is rejected by the compiler.

When the check is elided

The compiler emits the full CBGR check (~0.93 ns measured) for &T. It emits nothing for &checked T — escape analysis (one of eleven compile-time analyses in verum_cbgr) has proved the check unnecessary. The proof is witnessed in the compilation artefacts; you can inspect which references got promoted with:

$ verum analyze --escape ./src/main.vr
function     total   tier0   tier1   tier2   promoted
process        42       3      39       0          39/42 (92.9%)
tight_loop      8       0       8       0           8/8  (100%)

Or dump the full analysis suite with verum analyze --all. On idiomatic code the typical promotion rate is 60–95 %.

Tiered execution

In the VBC interpreter, CBGR checks run in software. In the LLVM AOT backend, they are lowered to native instructions and frequently collapsed by LLVM's optimiser when adjacent to each other or inside tight loops. In GPU kernels, CBGR is disabled by construction (kernels operate on a separate memory arena with statically checked accesses).

Performance numbers

Reported on an Apple M3 Max, release build with LTO:

Operation	Cycles	Nanoseconds
Unchecked pointer deref	2	0.5
`&checked T` deref	2	0.5
`&T` CBGR check + deref	55	13.8
`&T` check + cache miss on header	220	55
Free + increment generation	80	20

The "cache miss" line is worst case — the header is designed to share a cache line with the object, so in typical access patterns it's already hot.

Mental model

Think of CBGR as trading a small constant-factor overhead on every reference dereference for the complete elimination of an entire class of CVEs. For most code, 15 ns is invisible. For hot loops, escape analysis elides the check. For code where it cannot, you can be explicit about wanting &checked T and let the compiler tell you what needs refactoring.

The problem​

The idea in one paragraph​

What a reference looks like​

What a header looks like​

The check​

Why not just bounds-check?​

Generation wraparound​

Capability bits​

When the check is elided​

Tiered execution​

Performance numbers​

Mental model​

See also​