VBC — Verum Bytecode

VBC is Verum's unified intermediate representation. Every program lowers to VBC; from there, the interpreter runs it directly (Tier 0) or the AOT backend lowers it to LLVM IR (Tier 1). The same module format also carries GPU kernels lowered through MLIR to PTX / HSACO / SPIR-V / Metal.

Design goals

Single IR — VBC is both the bytecode and the lowering target; there is no separate MIR in the main pipeline.
Typed — every value and register has a type the verifier can check.
CBGR-aware — generation / epoch / capability checks are explicit opcodes (Ref, Deref, ChkRef, RefChecked, RefUnsafe, DropRef), so tier decisions survive all the way to the backend.
Cubical-aware — path types and transports have opcodes that the codegen erases to identity in release builds.
Portable — serialisable for distribution (.cog archives carry VBC + metadata + optional proof certificates).

Opcode map

The VBC instruction set comprises roughly 350 primary opcodes plus several extended opcode tables for arithmetic, tensor operations, cubical primitives, and CBGR tier lowering.

Range	Family	Representative opcodes
0x00 – 0x0F	Load / move / convert	`Mov`, `LoadK`, `LoadI`, `LoadF`, `LoadTrue`, `LoadSmallI`, `Nop`, `CvtIF`, `CvtFI`, `CvtIC`
0x10 – 0x1F	Integer arithmetic	`AddI`, `SubI`, `MulI`, `DivI`, `ModI`, `NegI`, `AbsI`, `PowI`, `Inc`, `Dec`, `CvtToI`
0x20 – 0x2F	Float arithmetic & extended	`AddF`, `SubF`, `MulF`, `DivF`, `NegF`, `PowF`, `MathExtended` (0x29), `SimdExtended` (0x2A), `CharExtended` (0x2B)
0x30 – 0x3F	Bitwise + generic arith	`Band`, `Bor`, `Bxor`, `Bnot`, `Shl`, `Shr`, `Ushr`, `AddG`, `SubG`, `MulG`, `DivG`
0x40 – 0x4F	Compare	`EqI`/`NeI`/`Lt`/`Le`/`Gt`/`Ge` (int + float), `EqG`, `CmpG`, `EqRef`, `CmpExtended` (0x4F)
0x50 – 0x5F	Control flow	`Jmp`, `JmpIf`, `JmpNot`, `JmpEq`/`Ne`/`Lt`/`Le`/`Gt`/`Ge`, `Ret`, `RetV`, `Call`, `TailCall`, `CallM`, `CallClosure`, `CallR`
0x60 – 0x6F	Objects / collections	`New`, `NewG`, `GetF`, `SetF`, `GetE`, `SetE`, `Len`, `NewArray`, `NewList`, `ListPush`, `ListPop`, `NewMap`, `MapGet`, `MapSet`, `MapContains`, `Clone`
0x70 – 0x77	CBGR references	`Ref`, `RefMut`, `Deref`, `DerefMut`, `ChkRef`, `RefChecked`, `RefUnsafe`, `DropRef`
0x78 – 0x7F	CBGR / text extensions	`CbgrExtended` (0x78), `TextExtended` (0x79)
0x80 – 0x8F	Generic dispatch	`CallG`, `CallV`, `CallC`, `SizeOfG`, `AlignOfG`, `Instantiate`, `MakeVariant`, `SetVariantData`, `GetVariantData`, `GetTag`, `NewClosure`, `GetVariantDataRef`, `TypeOf`
0x90 – 0x9F	Pattern matching & logic	`IsVar`, `AsVar`, `Unpack`, `Pack`, `Switch`, `MatchGuard`, `And`, `Or`, `Xor`, `Not`
0xA0 – 0xAF	Async / structured concurrency	`Spawn`, `Await`, `Yield`, `Select`, `Join`, `FutureReady`, `FutureGet`, `AsyncNext`, `NurseryInit`/`Spawn`/`Await`/`Cancel`/`Config`/`Error`
0xB0 – 0xBF	Context / capability / meta	`CtxGet`, `CtxProvide`, `CtxEnd`, `PushContext`, `PopContext`, `Attenuate`, `HasCapability`, `RequireCapability`, `MetaEval`, `MetaQuote`, `MetaSplice`, `MetaReflect`, `FfiExtended` (0xBC), `ArithExtended` (0xBD), `LogExtended` (0xBE), `MemExtended` (0xBF)
0xC0 – 0xCF	Iteration / text / set	`IterNew`, `IterNext`, `GenCreate`, `GenNext`, `GenHasNext`, `ToString`, `Concat`, `NewSet`, `SetInsert`, `SetContains`, `SetRemove`, `CharToStr`, `NewRange`, `NewDeque`, `Push`, `Pop`
0xD0 – 0xDF	Exceptions / contracts / cubical / channels	`Throw`, `TryBegin`, `TryEnd`, `GetException`, `Spec`, `Guard`, `Assert`, `Panic`, `Unreachable`, `DebugPrint`, `Requires`, `Ensures`, `Invariant`, `NewChannel`, `CubicalExtended` (0xDE), `DebugDF` (0xDF)
0xE0 – 0xEF	Syscalls / atomics / autodiff	`SyscallLinux`, `Mmap`, `Munmap`, `AtomicLoad`, `AtomicStore`, `AtomicCas`, `AtomicFence`, `IoSubmit`, `IoPoll`, `TlsGet`, `TlsSet`, `GradBegin`, `GradEnd`, `GradCheckpoint`, `GradAccumulate`, `GradStop`
0xF0 – 0xFF	Tensor / GPU / ML	`TensorNew`, `TensorBinop`, `TensorUnop`, `TensorMatmul`, `TensorReduce`, `TensorReshape`, `TensorTranspose`, `TensorSlice`, `GpuExtended` (0xF8), `GpuSync`, `GpuMemcpy`, `GpuAlloc`, `TensorExtended` (0xFC), `MlExtended` (0xFD), `TensorFull` (0xFE), `TensorFromSlice` (0xFF)

Extended opcode tables

Several primary opcodes are prefixes into second-byte tables:

CubicalExtended (0xDE) — path / interval / univalence. PathRefl, PathLambda, PathApp, PathSym, PathTrans, PathAp, Transport, Hcomp, IntervalMeet, IntervalJoin, IntervalRev, Ua, UaInv, EquivFwd, EquivBwd.
SimdExtended (0x2A) — 67 SIMD ops including VecSplat, VecLoad, VecAdd, VecFma, VecShuffle, VecReduce, gather / scatter. Variant count pinned by the simd_meta_count_pinned_at_ sixty_seven drift-test on SimdSubOpcode::ALL.
TensorExtended (0xFC) — attention, softmax, layernorm, batchnorm, einsum, conv, the flash-attention intrinsic.
GpuExtended (0xF8) — kernel launch, sync, memcpy, stream management, device allocation.
MathExtended (0x29), LogExtended (0xBE), FfiExtended (0xBC), MemExtended (0xBF), ArithExtended (0xBD), CharExtended (0x2B), TextExtended (0x79), CmpExtended (0x4F), MlExtended (0xFD) — per-family extensions.
Extended (0x1F) — general-purpose extension byte for first-class instructions that don't fit any family namespace. Wire format [0x1F] [sub_op:u8] [operands...]. Sub-op 0x00 is reserved as a forward-compat anchor; encoders must not emit it, decoders accept it as a no-op so future extensions land cleanly without breaking older interpreters. Carved out of a previously-reserved opcode slot during an opcode-space audit.

Extended-operand length prefix

Nine of the per-family extended opcodes (ArithExtended, FfiExtended, MathExtended, SimdExtended, CharExtended, CbgrExtended, TextExtended, MemExtended, LogExtended) carry their operands behind a varint length prefix so a sequential decoder can skip an unknown sub-op without parsing the inside:

[opcode:u8] [sub_op:u8] [varint(len)] [bytes:len]

The encoder writes the operand byte-count as a varint immediately after sub_op; the decoder reads the length and consumes exactly that many bytes via the decode_extended_operands helper. Interpreter handle_*_extended dispatchers skip the varint after the read_u8(sub_op) step and proceed with the structural decode they did before.

TensorExtended, GpuExtended, and the general-purpose Extended carrier do not carry the prefix — they have structural per-sub_op decoders that already know how far the operand block extends, so the prefix would be redundant.

The prefix is what lets the linker, disassembler, and validator walk a bytecode stream without an exhaustive sub-op table. Adding a new extended sub-op no longer requires touching every sequential decoder.

Slice opcodes (`CbgrExtended` 0x00–0x0A)

Slices in VBC are FatRef values that carry (ptr, len, elem_size) metadata in the reserved field (0 = NaN-boxed Value, 1/2/4/8 = raw integer stride). The corresponding sub-ops:

Sub-op	Code	Format	Notes
`RefSlice`	0x00	`dst, src, start, len`	Sub-ref from an array/list; infers stride from the source ObjectHeader.
`RefInterior`	0x01	`dst, base, field_offset:u32`	Struct interior ref.
`RefArrayElement`	0x02	`dst, base, index`	Array element ref.
`RefTrait`	0x03	`dst, src, vtable_id:u32`	Trait-object fat pointer.
`Unslice`	0x04	`dst, slice_ref`	Extract the raw pointer.
`SliceLen`	0x05	`dst, slice_ref`	Reads `metadata` from the FatRef.
`SliceGet`	0x06	`dst, slice_ref, index`	Stride-aware (respects `reserved`).
`SliceGetUnchecked`	0x07	`dst, slice_ref, index`	Same, no bounds check.
`SliceSubslice`	0x08	`dst, src, start, end`	Stride-aware subslicing.
`SliceSplitAt`	0x09	`dst1, dst2, src, mid`	Two FatRefs `[0..mid)` / `[mid..len)`.
`RefSliceRaw`	0x0A	`dst, ptr, len`	Build a FatRef from a raw pointer (no ObjectHeader inference); used for `slice_from_raw_parts`, byte buffers, and the middle of heap strings.

TextSubOpcode::AsBytes (0x34, under TextExtended 0x79) is the dispatch target for text.as_bytes() regardless of whether the value is a NaN-boxed small string or a heap-allocated Text. It materialises a byte-slice FatRef with elem_size=1, copying the six inline bytes of a small string into a fresh heap buffer so the returned reference has a stable address.

Adding up primary + extended tables: just over 350 opcodes.

Module format

A VBC module is a self-describing archive:

header:
  magic:        "VBC\0"
  version:      (major, minor, patch)
  flags:        bitfield
type_table:     list of interned types
const_table:    list of interned constants
function_table: list of function definitions
  each function:
    name, signature, local types, bytecode, debug info
proof_certificates:  optional — for @verify(certified)
metadata:            module-level attributes

Modules compress with LZ4. Deserialisation is zero-copy where possible (mmap + fixup).

Module-load trust boundary

Loading a .vbc module crosses a trust boundary: the bytes might have come from this process's own compiler (trusted) or from disk, network, or a shared cog archive (untrusted). The runtime exposes two load tiers so each call site declares its trust assumption explicitly.

Lenient load (trusted source)

The lenient tier does structural decode + the V-LLSI "interpretable?" flag check, nothing else. Use it when the bytecode comes from the compiler that's running right now — for example, after verum run builds a program in-process and hands the freshly-emitted bytecode straight to the interpreter. Running the validator on bytecode whose provenance is already trusted would just be wasted work.

Validated load (untrusted source)

The validated tier runs four passes before the module is allowed to execute:

Structural decode — same as the lenient tier.
Content-hash verification. The header carries a blake3 fingerprint over the bytes after the header. At load time the runtime recomputes the fingerprint and rejects any mismatch. This catches single-bit tampering on the disk artifact — somebody edited a .vbc file in place and forgot to re-stamp the hash. The check runs before decompression, since the hash is over the raw on-wire bytes. For a compressed module that's the compressed payload, exactly what the serializer hashed, so tampering is caught without paying the decompression cost first.
Dependency-hash verification. An independent fingerprint of the cog-distribution dependency graph (a blake3 over the concatenated dependency hashes). Lets a build cache, reproducibility checker, or cog resolver compare two modules' dep trees in O(8) without walking the full dependency table.
Per-instruction bytecode validation. Walks every function's bytecode and rejects:
- Out-of-range function references in Call / TailCall / CallG / NewClosure.
- Call-arity mismatches — every call site's argument count must match the target's declared parameter count.
- Register references past the function's declared register file size.
- Branch offsets (Jmp / JmpIf / JmpNot / JmpCmp / Switch / TryBegin) that fall outside the function's bytecode region, OR that land mid-instruction in another instruction's operand stream.
- Out-of-range constant-pool, string-table, or type-table references.
- Decoder failures mid-stream (a function's bytecode-byte count too small for the encoded instruction sequence).

When the validator finds multiple defects, the diagnostic is rendered as a numbered list — full forensic detail rather than a count-only summary.

Use the validated tier whenever bytecode comes from anywhere other than the compiler in this process: archives loaded from disk, modules pulled from a cog registry, files passed on the command line, network-loaded bytecode, IPC-shared modules. The cost is O(N) in total bytes (hash) + O(M) in instruction count, paid once at load — the runtime hot path stays gate-free.

Why the architecture matters

The same architectural anti-pattern lived in three places: the bytecode validator, the content hash, and the dependency hash. Each was a security-critical defense implemented as a public field or function with zero production callers — inert until the load-time wiring acted on it. The lesson: a public field carrying a security-critical value with no code path that ACTS on the value is a TODO regardless of how the field is named or documented.

Encoding-layer defenses

The four-pass validated load above operates on already-decoded bytes. Two further defenses live one layer down — at the byte decoder itself:

Varint canonicality (10-byte boundary)

A 10-byte unsigned varint encodes a u64. At byte 9 (shift = 63), only bit 0 of the 7-bit payload is meaningful — bits 1..6 of that byte represent positions 64..69 of the conceptual u70, which are NOT representable in u64. A naive result |= ((byte & 0x7F) as u64) << 63 would silently drop those bits via the platform's shift-out-of-range semantics, accepting adversarial encodings that smuggle invalid u64 values past the type boundary.

The decoder rejects any 10th-byte value with bits 1..6 set (byte & 0x7E != 0 at shift == 63) and returns VarIntOverflow. The legitimate boundary value u64::MAX (encoded as [0xFF×9, 0x01]) is still accepted — only invalid bit positions are rejected. Both decoder surfaces (slice-based decode_varint and Reader-based read_varint) carry the same defense, and the signed-varint path (decode_signed_varint) inherits it via decode_varint.

The same canonicality class is enforced at the language level by core/protobuf/wire.vr::read_varint (Google reference behaviour); the two implementations share the audit recipe: any decoder where multiple distinct inputs decode to the same value defeats round-trip equality and admits forgery.

Hostile-size allocation guards

Several VBC interpreter dispatch handlers (CbgrAlloc in ffi_extended.rs; GpuAlloc, MallocManaged, GpuMemAlloc, Free in gpu.rs) construct an allocation Layout from operand sizes carried in the bytecode. Adversarial bytecode can request sizes near isize::MAX; Layout::from_size_align correctly returns an error for such sizes, but the surrounding code used to either:

Panic the interpreter via .unwrap() on a chained fallback (DoS surface), or
Silently downgrade to a Layout::new::<u8>() 1-byte layout while the caller still believed they got size bytes — a heap overflow waiting to happen on the first write past byte 0, and on the deallocation path a dealloc(ptr, 1-byte-layout) for an N-byte allocation = std::alloc UB.

The fundamental fix:

Allocation paths return a null pointer on layout failure (the standard malloc-fail contract; the caller's Err(e) => ... arm fires).
Deallocation path leaks the buffer on layout failure rather than calling dealloc with a wrong layout. The architectural invariant (allocated_buffers only contains sizes that successfully constructed an alloc layout) makes the failure branch unreachable in practice; leak is the safe-by- default fallback for the impossible case.

The class lesson: in a trust-boundary handler, unwrap_or(default) is especially dangerous when the default is functionally smaller than the requested resource — it lies to callers about the resource and produces a worse failure mode than the true error case.

Archive memory-amplification bounds

A .vbc archive on disk is a directory of module entries; the header carries module_count, and each index entry declares name_len, dep_count, and data_size. Without bounds these attacker-controlled sizes turn a 32-byte hostile header into a terabyte allocation request — a memory-amplification denial-of- service that brings the loader down before the file's actual contents are even read.

The deserializer enforces architectural upper bounds before any allocation:

MAX_MODULES_PER_ARCHIVE = 65 536 — no real-world Verum archive shipped through cog publish approaches this. An archive claiming u32::MAX modules is rejected at the header gate.
MAX_MODULE_NAME_BYTES = 16 KB — module names are dotted paths; 16 KB is roughly 1 000 segments.
MAX_DEPS_PER_MODULE = 4 096 — a single module depending on more than 4 K others is implausible.
MAX_MODULE_DATA_BYTES = 1 GB — single-module bytecode beyond 1 GB is structurally suspect and refused.

Each rejection error message names the offending field so triage is immediate.

Module-table memory-amplification bounds

The per-module deserializer carries the same class of defense one layer down: a hostile module header (loaded directly, not through an archive) declares four *_count fields — type_table_count, function_table_count, constant_pool_count, specialization_table_count — each a u32 used to drive a Vec::with_capacity(count as usize) allocation before the deserializer reads a single entry.

A 64-byte hostile module header could request 500 GB-2 TB across the four tables. The deserializer enforces MAX_*_ENTRIES = 1 048 576 per table, refused before any allocation. Real-world modules carry at most a few thousand entries in any of these tables; 1 M is comfortably above any plausible module while staying far below the wraparound cliff.

A typed TableTooLarge { field, count, max } error names the offending field for triage.

Descriptor-level memory-amplification bounds

One layer further down: each type / function / specialization descriptor carries varint-encoded counts (type_params_count, fields_count, variants_count, protocols_count, methods_count, …) that drive per-descriptor allocations. A varint can encode u64::MAX, which casts to usize::MAX on 64-bit platforms — with_capacity(usize::MAX) aborts the process in most Rust allocators.

Tight architectural bounds (per real-world descriptor surface seen in the embedded stdlib):

MAX_TYPE_PARAMS_PER_DESCRIPTOR = 64 — matches the ast_to_type recursion cap that already gates the front-end, so a type the front-end accepted will always fit.
MAX_FIELDS_PER_DESCRIPTOR = 4 096
MAX_VARIANTS_PER_DESCRIPTOR = 4 096
MAX_PROTOCOLS_PER_DESCRIPTOR = 256
MAX_METHODS_PER_PROTOCOL_IMPL = 4 096

The same recursion descends into per-type-param / per-variant / per-type-ref counts, each of which has a tighter bound:

MAX_BOUNDS_PER_TYPE_PARAM = 64 — protocol bounds on a type parameter (fn f<T: P + Q>).
MAX_FIELDS_PER_VARIANT = 1 024 — per-variant struct fields (Some { a, b, c }).
MAX_TYPE_REF_INSTANTIATION_ARGS = 64 — generic instantiation arity (List<Int, String, …>).
MAX_FN_TYPE_REF_PARAMS = 256 — function-type signature parameter count.
MAX_FN_TYPE_REF_CONTEXTS = 32 — function-type context list (using [Logger, Database, …]).

Together with the outer-descriptor bounds these close every varint-driven Vec / SmallVec allocation in the VBC deserializer.

Bytecode-section bounds

The bytecode section's compressed branches read uncompressed_size: u32 from the on-wire header and pass it to the decompressor, which Vec::with_capacity-allocates that many bytes before reading any compressed input. A hostile section claiming u32::MAX triggered ~4 GB of allocation. Bound:

MAX_DECOMPRESSED_BYTECODE_BYTES = 1 GB — real Verum modules are kilobytes; the embedded stdlib (every core/*.vr compiled) totals ~14 MB.

A separate reliability fix removed an arithmetic underflow (section_size as usize - 1) that on a section_size == 0 input wrapped to usize::MAX and drove a multi-EB slice attempt. The bytecode reader now rejects zero-size sections at entry and uses checked_add throughout.

Implementation surface (Rust)

For compiler / runtime hackers: the trust boundary is exposed as two parallel constructor families on the verum_vbc crate (lenient → validated): deserialize_module / deserialize_module_validated, VbcArchive::load_module / load_module_validated, and Interpreter::try_new / try_new_validated. Validation failure surfaces as InterpreterError::ValidationFailed { module_name, reason }; the reason carries the rendered defect list.

Codegen-time validation

Beyond load-time validation, the codegen pipeline runs the same structural validator against every freshly-built module before returning it to the caller. The gate is CodegenConfig.validate, defaulting to true so production builds always get the safety net:

Field	Default	What it does
`validate`	`true`	After `build_module()` constructs the final `VbcModule` from accumulated codegen state, `finalize_module` calls `validate::validate_module(&module)` (strict mode). Validator failures surface as `CodegenError::internal` with the module name and the underlying diagnostic embedded in the message. The check is the codegen counterpart to the deserializer's load-time defenses — when codegen has a bug, the failure surfaces here at the codegen call site instead of as an interpreter panic / serializer error several layers later.

Toggle off via config.validate = false only when the caller has already validated upstream (e.g., a hot-path incremental rebuild that re-uses a previously-validated state). The validator runs in O(module-size) and is well within the codegen budget for typical modules.

Interpreter

The bytecode interpreter (verum_vbc::interpreter) dispatches via a table split across 37 handler files in interpreter/dispatch_table/handlers/, grouped by family:

cbgr.rs + cbgr_helpers.rs — Ref / Deref / tier checks.
cubical.rs — 0xDE sub-ops (erased at AOT codegen; implemented for REPL / interpreter).
tensor.rs + tensor_extended.rs — tensor opcodes (0xF0–0xF7, 0xFC, 0xFE, 0xFF).
method_dispatch.rs — protocol method resolution.
async_nursery.rs — 0xA0–0xAF structured concurrency.
context.rs — 0xB0–0xB7 capability / context stack.
meta.rs — 0xB8–0xBB reflection / quoting.
gpu.rs — 0xF8–0xFB GPU dispatch (in the interpreter, these go to a CPU simulator).
…and 29 more per-family handlers for arithmetic, control flow, collections, strings, iterators, autodiff, exceptions, etc.

Execution state lives in interpreter::state — a register file, a stack, the CBGR heap, and the context stack.

AOT lowering

verum_codegen::llvm::vbc_lowering walks VBC and emits equivalent LLVM IR. Tier-aware CBGR lowering selects between CbgrValidated, DirectLoad, and UncheckedLoad strategies based on the reference's opcode (Ref vs RefChecked vs RefUnsafe). Cubical opcodes lower to identity / passthrough (proof erasure).

GPU lowering

@device(GPU) functions route through verum_codegen::mlir instead of LLVM — TensorMatmul, TensorExtended::Attention, etc. lower to verum.tensor → linalg → gpu dialect → PTX / HSACO / SPIR-V / Metal. See codegen → MLIR backend.

Proof-carrying code

When @verify(certified) is set, VBC includes proof certificates:

certificate:
  obligation:   SMT obligation hash
  proof_term:   serialised proof (size varies)
  verifier:     "backend-a" | "backend-b" | "portfolio" | "manual"
  signed_by:    build identity

Downstream consumers can trust the certificate and skip verification, re-check the proof offline, or re-verify with a fresh solver run.

Inspection

Generate a human-readable VBC dump as part of a build:

verum build --emit-vbc

The dump lands in target/{debug,release}/<name>.vbc.txt alongside the usual artefacts. For per-phase breakdown, add --timings.

Design goals​

Opcode map​

Extended opcode tables​

Extended-operand length prefix​

Slice opcodes (CbgrExtended 0x00–0x0A)​

Module format​

Module-load trust boundary​

Lenient load (trusted source)​

Validated load (untrusted source)​

Why the architecture matters​

Encoding-layer defenses​

Varint canonicality (10-byte boundary)​

Hostile-size allocation guards​

Archive memory-amplification bounds​

Module-table memory-amplification bounds​

Descriptor-level memory-amplification bounds​

Bytecode-section bounds​

Implementation surface (Rust)​

Codegen-time validation​

Interpreter​

AOT lowering​

GPU lowering​

Proof-carrying code​

Inspection​

See also​