Skip to main content

VBC — Verum Bytecode

VBC is Verum's unified intermediate representation. Every program lowers to VBC; from there, the interpreter runs it directly (Tier 0) or the AOT backend lowers it to LLVM IR (Tier 1). The same module format also carries GPU kernels lowered through MLIR to PTX / HSACO / SPIR-V / Metal.

Design goals

  1. Single IR — VBC is both the bytecode and the lowering target; there is no separate MIR in the main pipeline.
  2. Typed — every value and register has a type the verifier can check.
  3. CBGR-aware — generation / epoch / capability checks are explicit opcodes (Ref, Deref, ChkRef, RefChecked, RefUnsafe, DropRef), so tier decisions survive all the way to the backend.
  4. Cubical-aware — path types and transports have opcodes that the codegen erases to identity in release builds.
  5. Portable — serialisable for distribution (.cog archives carry VBC + metadata + optional proof certificates).

Opcode map

The VBC instruction set comprises roughly 350 primary opcodes plus several extended opcode tables for arithmetic, tensor operations, cubical primitives, and CBGR tier lowering.

RangeFamilyRepresentative opcodes
0x00 – 0x0FLoad / move / convertMov, LoadK, LoadI, LoadF, LoadTrue, LoadSmallI, Nop, CvtIF, CvtFI, CvtIC
0x10 – 0x1FInteger arithmeticAddI, SubI, MulI, DivI, ModI, NegI, AbsI, PowI, Inc, Dec, CvtToI
0x20 – 0x2FFloat arithmetic & extendedAddF, SubF, MulF, DivF, NegF, PowF, MathExtended (0x29), SimdExtended (0x2A), CharExtended (0x2B)
0x30 – 0x3FBitwise + generic arithBand, Bor, Bxor, Bnot, Shl, Shr, Ushr, AddG, SubG, MulG, DivG
0x40 – 0x4FCompareEqI/NeI/Lt/Le/Gt/Ge (int + float), EqG, CmpG, EqRef, CmpExtended (0x4F)
0x50 – 0x5FControl flowJmp, JmpIf, JmpNot, JmpEq/Ne/Lt/Le/Gt/Ge, Ret, RetV, Call, TailCall, CallM, CallClosure, CallR
0x60 – 0x6FObjects / collectionsNew, NewG, GetF, SetF, GetE, SetE, Len, NewArray, NewList, ListPush, ListPop, NewMap, MapGet, MapSet, MapContains, Clone
0x70 – 0x77CBGR referencesRef, RefMut, Deref, DerefMut, ChkRef, RefChecked, RefUnsafe, DropRef
0x78 – 0x7FCBGR / text extensionsCbgrExtended (0x78), TextExtended (0x79)
0x80 – 0x8FGeneric dispatchCallG, CallV, CallC, SizeOfG, AlignOfG, Instantiate, MakeVariant, SetVariantData, GetVariantData, GetTag, NewClosure, GetVariantDataRef, TypeOf
0x90 – 0x9FPattern matching & logicIsVar, AsVar, Unpack, Pack, Switch, MatchGuard, And, Or, Xor, Not
0xA0 – 0xAFAsync / structured concurrencySpawn, Await, Yield, Select, Join, FutureReady, FutureGet, AsyncNext, NurseryInit/Spawn/Await/Cancel/Config/Error
0xB0 – 0xBFContext / capability / metaCtxGet, CtxProvide, CtxEnd, PushContext, PopContext, Attenuate, HasCapability, RequireCapability, MetaEval, MetaQuote, MetaSplice, MetaReflect, FfiExtended (0xBC), ArithExtended (0xBD), LogExtended (0xBE), MemExtended (0xBF)
0xC0 – 0xCFIteration / text / setIterNew, IterNext, GenCreate, GenNext, GenHasNext, ToString, Concat, NewSet, SetInsert, SetContains, SetRemove, CharToStr, NewRange, NewDeque, Push, Pop
0xD0 – 0xDFExceptions / contracts / cubical / channelsThrow, TryBegin, TryEnd, GetException, Spec, Guard, Assert, Panic, Unreachable, DebugPrint, Requires, Ensures, Invariant, NewChannel, CubicalExtended (0xDE), DebugDF (0xDF)
0xE0 – 0xEFSyscalls / atomics / autodiffSyscallLinux, Mmap, Munmap, AtomicLoad, AtomicStore, AtomicCas, AtomicFence, IoSubmit, IoPoll, TlsGet, TlsSet, GradBegin, GradEnd, GradCheckpoint, GradAccumulate, GradStop
0xF0 – 0xFFTensor / GPU / MLTensorNew, TensorBinop, TensorUnop, TensorMatmul, TensorReduce, TensorReshape, TensorTranspose, TensorSlice, GpuExtended (0xF8), GpuSync, GpuMemcpy, GpuAlloc, TensorExtended (0xFC), MlExtended (0xFD), TensorFull (0xFE), TensorFromSlice (0xFF)

Extended opcode tables

Several primary opcodes are prefixes into second-byte tables:

  • CubicalExtended (0xDE) — path / interval / univalence. PathRefl, PathLambda, PathApp, PathSym, PathTrans, PathAp, Transport, Hcomp, IntervalMeet, IntervalJoin, IntervalRev, Ua, UaInv, EquivFwd, EquivBwd.
  • SimdExtended (0x2A) — 67 SIMD ops including VecSplat, VecLoad, VecAdd, VecFma, VecShuffle, VecReduce, gather / scatter. Variant count pinned by the simd_meta_count_pinned_at_ sixty_seven drift-test on SimdSubOpcode::ALL.
  • TensorExtended (0xFC) — attention, softmax, layernorm, batchnorm, einsum, conv, the flash-attention intrinsic.
  • GpuExtended (0xF8) — kernel launch, sync, memcpy, stream management, device allocation.
  • MathExtended (0x29), LogExtended (0xBE), FfiExtended (0xBC), MemExtended (0xBF), ArithExtended (0xBD), CharExtended (0x2B), TextExtended (0x79), CmpExtended (0x4F), MlExtended (0xFD) — per-family extensions.
  • Extended (0x1F) — general-purpose extension byte for first-class instructions that don't fit any family namespace. Wire format [0x1F] [sub_op:u8] [operands...]. Sub-op 0x00 is reserved as a forward-compat anchor; encoders must not emit it, decoders accept it as a no-op so future extensions land cleanly without breaking older interpreters. Carved out of a previously-reserved opcode slot during an opcode-space audit.

Extended-operand length prefix

Nine of the per-family extended opcodes (ArithExtended, FfiExtended, MathExtended, SimdExtended, CharExtended, CbgrExtended, TextExtended, MemExtended, LogExtended) carry their operands behind a varint length prefix so a sequential decoder can skip an unknown sub-op without parsing the inside:

[opcode:u8] [sub_op:u8] [varint(len)] [bytes:len]

The encoder writes the operand byte-count as a varint immediately after sub_op; the decoder reads the length and consumes exactly that many bytes via the decode_extended_operands helper. Interpreter handle_*_extended dispatchers skip the varint after the read_u8(sub_op) step and proceed with the structural decode they did before.

TensorExtended, GpuExtended, and the general-purpose Extended carrier do not carry the prefix — they have structural per-sub_op decoders that already know how far the operand block extends, so the prefix would be redundant.

The prefix is what lets the linker, disassembler, and validator walk a bytecode stream without an exhaustive sub-op table. Adding a new extended sub-op no longer requires touching every sequential decoder.

Slice opcodes (CbgrExtended 0x00–0x0A)

Slices in VBC are FatRef values that carry (ptr, len, elem_size) metadata in the reserved field (0 = NaN-boxed Value, 1/2/4/8 = raw integer stride). The corresponding sub-ops:

Sub-opCodeFormatNotes
RefSlice0x00dst, src, start, lenSub-ref from an array/list; infers stride from the source ObjectHeader.
RefInterior0x01dst, base, field_offset:u32Struct interior ref.
RefArrayElement0x02dst, base, indexArray element ref.
RefTrait0x03dst, src, vtable_id:u32Trait-object fat pointer.
Unslice0x04dst, slice_refExtract the raw pointer.
SliceLen0x05dst, slice_refReads metadata from the FatRef.
SliceGet0x06dst, slice_ref, indexStride-aware (respects reserved).
SliceGetUnchecked0x07dst, slice_ref, indexSame, no bounds check.
SliceSubslice0x08dst, src, start, endStride-aware subslicing.
SliceSplitAt0x09dst1, dst2, src, midTwo FatRefs [0..mid) / [mid..len).
RefSliceRaw0x0Adst, ptr, lenBuild a FatRef from a raw pointer (no ObjectHeader inference); used for slice_from_raw_parts, byte buffers, and the middle of heap strings.

TextSubOpcode::AsBytes (0x34, under TextExtended 0x79) is the dispatch target for text.as_bytes() regardless of whether the value is a NaN-boxed small string or a heap-allocated Text. It materialises a byte-slice FatRef with elem_size=1, copying the six inline bytes of a small string into a fresh heap buffer so the returned reference has a stable address.

Adding up primary + extended tables: just over 350 opcodes.

Module format

A VBC module is a self-describing archive:

header:
magic: "VBC\0"
version: (major, minor, patch)
flags: bitfield
type_table: list of interned types
const_table: list of interned constants
function_table: list of function definitions
each function:
name, signature, local types, bytecode, debug info
proof_certificates: optional — for @verify(certified)
metadata: module-level attributes

Modules compress with LZ4. Deserialisation is zero-copy where possible (mmap + fixup).

Module-load trust boundary

Loading a .vbc module crosses a trust boundary: the bytes might have come from this process's own compiler (trusted) or from disk, network, or a shared cog archive (untrusted). The runtime exposes two load tiers so each call site declares its trust assumption explicitly.

Lenient load (trusted source)

The lenient tier does structural decode + the V-LLSI "interpretable?" flag check, nothing else. Use it when the bytecode comes from the compiler that's running right now — for example, after verum run builds a program in-process and hands the freshly-emitted bytecode straight to the interpreter. Running the validator on bytecode whose provenance is already trusted would just be wasted work.

Validated load (untrusted source)

The validated tier runs four passes before the module is allowed to execute:

  1. Structural decode — same as the lenient tier.
  2. Content-hash verification. The header carries a blake3 fingerprint over the bytes after the header. At load time the runtime recomputes the fingerprint and rejects any mismatch. This catches single-bit tampering on the disk artifact — somebody edited a .vbc file in place and forgot to re-stamp the hash. The check runs before decompression, since the hash is over the raw on-wire bytes. For a compressed module that's the compressed payload, exactly what the serializer hashed, so tampering is caught without paying the decompression cost first.
  3. Dependency-hash verification. An independent fingerprint of the cog-distribution dependency graph (a blake3 over the concatenated dependency hashes). Lets a build cache, reproducibility checker, or cog resolver compare two modules' dep trees in O(8) without walking the full dependency table.
  4. Per-instruction bytecode validation. Walks every function's bytecode and rejects:
    • Out-of-range function references in Call / TailCall / CallG / NewClosure.
    • Call-arity mismatches — every call site's argument count must match the target's declared parameter count.
    • Register references past the function's declared register file size.
    • Branch offsets (Jmp / JmpIf / JmpNot / JmpCmp / Switch / TryBegin) that fall outside the function's bytecode region, OR that land mid-instruction in another instruction's operand stream.
    • Out-of-range constant-pool, string-table, or type-table references.
    • Decoder failures mid-stream (a function's bytecode-byte count too small for the encoded instruction sequence).

When the validator finds multiple defects, the diagnostic is rendered as a numbered list — full forensic detail rather than a count-only summary.

Use the validated tier whenever bytecode comes from anywhere other than the compiler in this process: archives loaded from disk, modules pulled from a cog registry, files passed on the command line, network-loaded bytecode, IPC-shared modules. The cost is O(N) in total bytes (hash) + O(M) in instruction count, paid once at load — the runtime hot path stays gate-free.

Why the architecture matters

The same architectural anti-pattern lived in three places: the bytecode validator, the content hash, and the dependency hash. Each was a security-critical defense implemented as a public field or function with zero production callers — inert until the load-time wiring acted on it. The lesson: a public field carrying a security-critical value with no code path that ACTS on the value is a TODO regardless of how the field is named or documented.

Encoding-layer defenses

The four-pass validated load above operates on already-decoded bytes. Two further defenses live one layer down — at the byte decoder itself:

Varint canonicality (10-byte boundary)

A 10-byte unsigned varint encodes a u64. At byte 9 (shift = 63), only bit 0 of the 7-bit payload is meaningful — bits 1..6 of that byte represent positions 64..69 of the conceptual u70, which are NOT representable in u64. A naive result |= ((byte & 0x7F) as u64) << 63 would silently drop those bits via the platform's shift-out-of-range semantics, accepting adversarial encodings that smuggle invalid u64 values past the type boundary.

The decoder rejects any 10th-byte value with bits 1..6 set (byte & 0x7E != 0 at shift == 63) and returns VarIntOverflow. The legitimate boundary value u64::MAX (encoded as [0xFF×9, 0x01]) is still accepted — only invalid bit positions are rejected. Both decoder surfaces (slice-based decode_varint and Reader-based read_varint) carry the same defense, and the signed-varint path (decode_signed_varint) inherits it via decode_varint.

The same canonicality class is enforced at the language level by core/protobuf/wire.vr::read_varint (Google reference behaviour); the two implementations share the audit recipe: any decoder where multiple distinct inputs decode to the same value defeats round-trip equality and admits forgery.

Hostile-size allocation guards

Several VBC interpreter dispatch handlers (CbgrAlloc in ffi_extended.rs; GpuAlloc, MallocManaged, GpuMemAlloc, Free in gpu.rs) construct an allocation Layout from operand sizes carried in the bytecode. Adversarial bytecode can request sizes near isize::MAX; Layout::from_size_align correctly returns an error for such sizes, but the surrounding code used to either:

  • Panic the interpreter via .unwrap() on a chained fallback (DoS surface), or
  • Silently downgrade to a Layout::new::<u8>() 1-byte layout while the caller still believed they got size bytes — a heap overflow waiting to happen on the first write past byte 0, and on the deallocation path a dealloc(ptr, 1-byte-layout) for an N-byte allocation = std::alloc UB.

The fundamental fix:

  • Allocation paths return a null pointer on layout failure (the standard malloc-fail contract; the caller's Err(e) => ... arm fires).
  • Deallocation path leaks the buffer on layout failure rather than calling dealloc with a wrong layout. The architectural invariant (allocated_buffers only contains sizes that successfully constructed an alloc layout) makes the failure branch unreachable in practice; leak is the safe-by- default fallback for the impossible case.

The class lesson: in a trust-boundary handler, unwrap_or(default) is especially dangerous when the default is functionally smaller than the requested resource — it lies to callers about the resource and produces a worse failure mode than the true error case.

Archive memory-amplification bounds

A .vbc archive on disk is a directory of module entries; the header carries module_count, and each index entry declares name_len, dep_count, and data_size. Without bounds these attacker-controlled sizes turn a 32-byte hostile header into a terabyte allocation request — a memory-amplification denial-of- service that brings the loader down before the file's actual contents are even read.

The deserializer enforces architectural upper bounds before any allocation:

  • MAX_MODULES_PER_ARCHIVE = 65 536 — no real-world Verum archive shipped through cog publish approaches this. An archive claiming u32::MAX modules is rejected at the header gate.
  • MAX_MODULE_NAME_BYTES = 16 KB — module names are dotted paths; 16 KB is roughly 1 000 segments.
  • MAX_DEPS_PER_MODULE = 4 096 — a single module depending on more than 4 K others is implausible.
  • MAX_MODULE_DATA_BYTES = 1 GB — single-module bytecode beyond 1 GB is structurally suspect and refused.

Each rejection error message names the offending field so triage is immediate.

Module-table memory-amplification bounds

The per-module deserializer carries the same class of defense one layer down: a hostile module header (loaded directly, not through an archive) declares four *_count fields — type_table_count, function_table_count, constant_pool_count, specialization_table_count — each a u32 used to drive a Vec::with_capacity(count as usize) allocation before the deserializer reads a single entry.

A 64-byte hostile module header could request 500 GB-2 TB across the four tables. The deserializer enforces MAX_*_ENTRIES = 1 048 576 per table, refused before any allocation. Real-world modules carry at most a few thousand entries in any of these tables; 1 M is comfortably above any plausible module while staying far below the wraparound cliff.

A typed TableTooLarge { field, count, max } error names the offending field for triage.

Descriptor-level memory-amplification bounds

One layer further down: each type / function / specialization descriptor carries varint-encoded counts (type_params_count, fields_count, variants_count, protocols_count, methods_count, …) that drive per-descriptor allocations. A varint can encode u64::MAX, which casts to usize::MAX on 64-bit platforms — with_capacity(usize::MAX) aborts the process in most Rust allocators.

Tight architectural bounds (per real-world descriptor surface seen in the embedded stdlib):

  • MAX_TYPE_PARAMS_PER_DESCRIPTOR = 64 — matches the ast_to_type recursion cap that already gates the front-end, so a type the front-end accepted will always fit.
  • MAX_FIELDS_PER_DESCRIPTOR = 4 096
  • MAX_VARIANTS_PER_DESCRIPTOR = 4 096
  • MAX_PROTOCOLS_PER_DESCRIPTOR = 256
  • MAX_METHODS_PER_PROTOCOL_IMPL = 4 096

The same recursion descends into per-type-param / per-variant / per-type-ref counts, each of which has a tighter bound:

  • MAX_BOUNDS_PER_TYPE_PARAM = 64 — protocol bounds on a type parameter (fn f<T: P + Q>).
  • MAX_FIELDS_PER_VARIANT = 1 024 — per-variant struct fields (Some { a, b, c }).
  • MAX_TYPE_REF_INSTANTIATION_ARGS = 64 — generic instantiation arity (List<Int, String, …>).
  • MAX_FN_TYPE_REF_PARAMS = 256 — function-type signature parameter count.
  • MAX_FN_TYPE_REF_CONTEXTS = 32 — function-type context list (using [Logger, Database, …]).

Together with the outer-descriptor bounds these close every varint-driven Vec / SmallVec allocation in the VBC deserializer.

Bytecode-section bounds

The bytecode section's compressed branches read uncompressed_size: u32 from the on-wire header and pass it to the decompressor, which Vec::with_capacity-allocates that many bytes before reading any compressed input. A hostile section claiming u32::MAX triggered ~4 GB of allocation. Bound:

  • MAX_DECOMPRESSED_BYTECODE_BYTES = 1 GB — real Verum modules are kilobytes; the embedded stdlib (every core/*.vr compiled) totals ~14 MB.

A separate reliability fix removed an arithmetic underflow (section_size as usize - 1) that on a section_size == 0 input wrapped to usize::MAX and drove a multi-EB slice attempt. The bytecode reader now rejects zero-size sections at entry and uses checked_add throughout.

Implementation surface (Rust)

For compiler / runtime hackers: the trust boundary is exposed as two parallel constructor families on the verum_vbc crate (lenient → validated): deserialize_module / deserialize_module_validated, VbcArchive::load_module / load_module_validated, and Interpreter::try_new / try_new_validated. Validation failure surfaces as InterpreterError::ValidationFailed { module_name, reason }; the reason carries the rendered defect list.

Codegen-time validation

Beyond load-time validation, the codegen pipeline runs the same structural validator against every freshly-built module before returning it to the caller. The gate is CodegenConfig.validate, defaulting to true so production builds always get the safety net:

FieldDefaultWhat it does
validatetrueAfter build_module() constructs the final VbcModule from accumulated codegen state, finalize_module calls validate::validate_module(&module) (strict mode). Validator failures surface as CodegenError::internal with the module name and the underlying diagnostic embedded in the message. The check is the codegen counterpart to the deserializer's load-time defenses — when codegen has a bug, the failure surfaces here at the codegen call site instead of as an interpreter panic / serializer error several layers later.

Toggle off via config.validate = false only when the caller has already validated upstream (e.g., a hot-path incremental rebuild that re-uses a previously-validated state). The validator runs in O(module-size) and is well within the codegen budget for typical modules.

Interpreter

The bytecode interpreter (verum_vbc::interpreter) dispatches via a table split across 37 handler files in interpreter/dispatch_table/handlers/, grouped by family:

  • cbgr.rs + cbgr_helpers.rs — Ref / Deref / tier checks.
  • cubical.rs — 0xDE sub-ops (erased at AOT codegen; implemented for REPL / interpreter).
  • tensor.rs + tensor_extended.rs — tensor opcodes (0xF0–0xF7, 0xFC, 0xFE, 0xFF).
  • method_dispatch.rs — protocol method resolution.
  • async_nursery.rs — 0xA0–0xAF structured concurrency.
  • context.rs — 0xB0–0xB7 capability / context stack.
  • meta.rs — 0xB8–0xBB reflection / quoting.
  • gpu.rs — 0xF8–0xFB GPU dispatch (in the interpreter, these go to a CPU simulator).
  • …and 29 more per-family handlers for arithmetic, control flow, collections, strings, iterators, autodiff, exceptions, etc.

Execution state lives in interpreter::state — a register file, a stack, the CBGR heap, and the context stack.

AOT lowering

verum_codegen::llvm::vbc_lowering walks VBC and emits equivalent LLVM IR. Tier-aware CBGR lowering selects between CbgrValidated, DirectLoad, and UncheckedLoad strategies based on the reference's opcode (Ref vs RefChecked vs RefUnsafe). Cubical opcodes lower to identity / passthrough (proof erasure).

GPU lowering

@device(GPU) functions route through verum_codegen::mlir instead of LLVM — TensorMatmul, TensorExtended::Attention, etc. lower to verum.tensorlinalggpu dialect → PTX / HSACO / SPIR-V / Metal. See codegen → MLIR backend.

Proof-carrying code

When @verify(certified) is set, VBC includes proof certificates:

certificate:
obligation: SMT obligation hash
proof_term: serialised proof (size varies)
verifier: "backend-a" | "backend-b" | "portfolio" | "manual"
signed_by: build identity

Downstream consumers can trust the certificate and skip verification, re-check the proof offline, or re-verify with a fresh solver run.

Inspection

Generate a human-readable VBC dump as part of a build:

verum build --emit-vbc

The dump lands in target/{debug,release}/<name>.vbc.txt alongside the usual artefacts. For per-phase breakdown, add --timings.

See also