VBC — Verum Bytecode
VBC is Verum's unified intermediate representation. Every program lowers to VBC; from there, the interpreter runs it directly (Tier 0) or the AOT backend lowers it to LLVM IR (Tier 1). The same module format also carries GPU kernels lowered through MLIR to PTX / HSACO / SPIR-V / Metal.
Design goals
- Single IR — VBC is both the bytecode and the lowering target; there is no separate MIR in the main pipeline.
- Typed — every value and register has a type the verifier can check.
- CBGR-aware — generation / epoch / capability checks are
explicit opcodes (
Ref,Deref,ChkRef,RefChecked,RefUnsafe,DropRef), so tier decisions survive all the way to the backend. - Cubical-aware — path types and transports have opcodes that the codegen erases to identity in release builds.
- Portable — serialisable for distribution (
.cogarchives carry VBC + metadata + optional proof certificates).
Opcode map
The VBC instruction set comprises roughly 350 primary opcodes plus several extended opcode tables for arithmetic, tensor operations, cubical primitives, and CBGR tier lowering.
| Range | Family | Representative opcodes |
|---|---|---|
| 0x00 – 0x0F | Load / move / convert | Mov, LoadK, LoadI, LoadF, LoadTrue, LoadSmallI, Nop, CvtIF, CvtFI, CvtIC |
| 0x10 – 0x1F | Integer arithmetic | AddI, SubI, MulI, DivI, ModI, NegI, AbsI, PowI, Inc, Dec, CvtToI |
| 0x20 – 0x2F | Float arithmetic & extended | AddF, SubF, MulF, DivF, NegF, PowF, MathExtended (0x29), SimdExtended (0x2A), CharExtended (0x2B) |
| 0x30 – 0x3F | Bitwise + generic arith | Band, Bor, Bxor, Bnot, Shl, Shr, Ushr, AddG, SubG, MulG, DivG |
| 0x40 – 0x4F | Compare | EqI/NeI/Lt/Le/Gt/Ge (int + float), EqG, CmpG, EqRef, CmpExtended (0x4F) |
| 0x50 – 0x5F | Control flow | Jmp, JmpIf, JmpNot, JmpEq/Ne/Lt/Le/Gt/Ge, Ret, RetV, Call, TailCall, CallM, CallClosure, CallR |
| 0x60 – 0x6F | Objects / collections | New, NewG, GetF, SetF, GetE, SetE, Len, NewArray, NewList, ListPush, ListPop, NewMap, MapGet, MapSet, MapContains, Clone |
| 0x70 – 0x77 | CBGR references | Ref, RefMut, Deref, DerefMut, ChkRef, RefChecked, RefUnsafe, DropRef |
| 0x78 – 0x7F | CBGR / text extensions | CbgrExtended (0x78), TextExtended (0x79) |
| 0x80 – 0x8F | Generic dispatch | CallG, CallV, CallC, SizeOfG, AlignOfG, Instantiate, MakeVariant, SetVariantData, GetVariantData, GetTag, NewClosure, GetVariantDataRef, TypeOf |
| 0x90 – 0x9F | Pattern matching & logic | IsVar, AsVar, Unpack, Pack, Switch, MatchGuard, And, Or, Xor, Not |
| 0xA0 – 0xAF | Async / structured concurrency | Spawn, Await, Yield, Select, Join, FutureReady, FutureGet, AsyncNext, NurseryInit/Spawn/Await/Cancel/Config/Error |
| 0xB0 – 0xBF | Context / capability / meta | CtxGet, CtxProvide, CtxEnd, PushContext, PopContext, Attenuate, HasCapability, RequireCapability, MetaEval, MetaQuote, MetaSplice, MetaReflect, FfiExtended (0xBC), ArithExtended (0xBD), LogExtended (0xBE), MemExtended (0xBF) |
| 0xC0 – 0xCF | Iteration / text / set | IterNew, IterNext, GenCreate, GenNext, GenHasNext, ToString, Concat, NewSet, SetInsert, SetContains, SetRemove, CharToStr, NewRange, NewDeque, Push, Pop |
| 0xD0 – 0xDF | Exceptions / contracts / cubical / channels | Throw, TryBegin, TryEnd, GetException, Spec, Guard, Assert, Panic, Unreachable, DebugPrint, Requires, Ensures, Invariant, NewChannel, CubicalExtended (0xDE), DebugDF (0xDF) |
| 0xE0 – 0xEF | Syscalls / atomics / autodiff | SyscallLinux, Mmap, Munmap, AtomicLoad, AtomicStore, AtomicCas, AtomicFence, IoSubmit, IoPoll, TlsGet, TlsSet, GradBegin, GradEnd, GradCheckpoint, GradAccumulate, GradStop |
| 0xF0 – 0xFF | Tensor / GPU / ML | TensorNew, TensorBinop, TensorUnop, TensorMatmul, TensorReduce, TensorReshape, TensorTranspose, TensorSlice, GpuExtended (0xF8), GpuSync, GpuMemcpy, GpuAlloc, TensorExtended (0xFC), MlExtended (0xFD), TensorFull (0xFE), TensorFromSlice (0xFF) |
Extended opcode tables
Several primary opcodes are prefixes into second-byte tables:
CubicalExtended(0xDE) — path / interval / univalence.PathRefl,PathLambda,PathApp,PathSym,PathTrans,PathAp,Transport,Hcomp,IntervalMeet,IntervalJoin,IntervalRev,Ua,UaInv,EquivFwd,EquivBwd.SimdExtended(0x2A) — 67 SIMD ops includingVecSplat,VecLoad,VecAdd,VecFma,VecShuffle,VecReduce, gather / scatter. Variant count pinned by thesimd_meta_count_pinned_at_ sixty_sevendrift-test onSimdSubOpcode::ALL.TensorExtended(0xFC) — attention, softmax, layernorm, batchnorm, einsum, conv, the flash-attention intrinsic.GpuExtended(0xF8) — kernel launch, sync, memcpy, stream management, device allocation.MathExtended(0x29),LogExtended(0xBE),FfiExtended(0xBC),MemExtended(0xBF),ArithExtended(0xBD),CharExtended(0x2B),TextExtended(0x79),CmpExtended(0x4F),MlExtended(0xFD) — per-family extensions.Extended(0x1F) — general-purpose extension byte for first-class instructions that don't fit any family namespace. Wire format[0x1F] [sub_op:u8] [operands...]. Sub-op0x00is reserved as a forward-compat anchor; encoders must not emit it, decoders accept it as a no-op so future extensions land cleanly without breaking older interpreters. Carved out of a previously-reserved opcode slot during an opcode-space audit.
Extended-operand length prefix
Nine of the per-family extended opcodes (ArithExtended,
FfiExtended, MathExtended, SimdExtended, CharExtended,
CbgrExtended, TextExtended, MemExtended, LogExtended)
carry their operands behind a varint length prefix so a
sequential decoder can skip an unknown sub-op without parsing
the inside:
[opcode:u8] [sub_op:u8] [varint(len)] [bytes:len]
The encoder writes the operand byte-count as a varint
immediately after sub_op; the decoder reads the length and
consumes exactly that many bytes via the
decode_extended_operands helper. Interpreter
handle_*_extended dispatchers skip the varint after the
read_u8(sub_op) step and proceed with the structural decode
they did before.
TensorExtended, GpuExtended, and the general-purpose
Extended carrier do not carry the prefix — they have
structural per-sub_op decoders that already know how far the
operand block extends, so the prefix would be redundant.
The prefix is what lets the linker, disassembler, and validator walk a bytecode stream without an exhaustive sub-op table. Adding a new extended sub-op no longer requires touching every sequential decoder.
Slice opcodes (CbgrExtended 0x00–0x0A)
Slices in VBC are FatRef values that carry (ptr, len, elem_size)
metadata in the reserved field (0 = NaN-boxed Value, 1/2/4/8 = raw integer stride). The corresponding sub-ops:
| Sub-op | Code | Format | Notes |
|---|---|---|---|
RefSlice | 0x00 | dst, src, start, len | Sub-ref from an array/list; infers stride from the source ObjectHeader. |
RefInterior | 0x01 | dst, base, field_offset:u32 | Struct interior ref. |
RefArrayElement | 0x02 | dst, base, index | Array element ref. |
RefTrait | 0x03 | dst, src, vtable_id:u32 | Trait-object fat pointer. |
Unslice | 0x04 | dst, slice_ref | Extract the raw pointer. |
SliceLen | 0x05 | dst, slice_ref | Reads metadata from the FatRef. |
SliceGet | 0x06 | dst, slice_ref, index | Stride-aware (respects reserved). |
SliceGetUnchecked | 0x07 | dst, slice_ref, index | Same, no bounds check. |
SliceSubslice | 0x08 | dst, src, start, end | Stride-aware subslicing. |
SliceSplitAt | 0x09 | dst1, dst2, src, mid | Two FatRefs [0..mid) / [mid..len). |
RefSliceRaw | 0x0A | dst, ptr, len | Build a FatRef from a raw pointer (no ObjectHeader inference); used for slice_from_raw_parts, byte buffers, and the middle of heap strings. |
TextSubOpcode::AsBytes (0x34, under TextExtended 0x79) is the
dispatch target for text.as_bytes() regardless of whether the
value is a NaN-boxed small string or a heap-allocated Text. It
materialises a byte-slice FatRef with elem_size=1, copying the six
inline bytes of a small string into a fresh heap buffer so the
returned reference has a stable address.
Adding up primary + extended tables: just over 350 opcodes.
Module format
A VBC module is a self-describing archive:
header:
magic: "VBC\0"
version: (major, minor, patch)
flags: bitfield
type_table: list of interned types
const_table: list of interned constants
function_table: list of function definitions
each function:
name, signature, local types, bytecode, debug info
proof_certificates: optional — for @verify(certified)
metadata: module-level attributes
Modules compress with LZ4. Deserialisation is zero-copy where possible (mmap + fixup).
Module-load trust boundary
Loading a .vbc module crosses a trust boundary: the bytes might
have come from this process's own compiler (trusted) or from disk,
network, or a shared cog archive (untrusted). The runtime exposes
two load tiers so each call site declares its trust assumption
explicitly.
Lenient load (trusted source)
The lenient tier does structural decode + the V-LLSI
"interpretable?" flag check, nothing else. Use it when the
bytecode comes from the compiler that's running right now — for
example, after verum run builds a program in-process and hands
the freshly-emitted bytecode straight to the interpreter. Running
the validator on bytecode whose provenance is already trusted would
just be wasted work.
Validated load (untrusted source)
The validated tier runs four passes before the module is allowed to execute:
- Structural decode — same as the lenient tier.
- Content-hash verification. The header carries a blake3
fingerprint over the bytes after the header. At load time the
runtime recomputes the fingerprint and rejects any mismatch.
This catches single-bit tampering on the disk artifact —
somebody edited a
.vbcfile in place and forgot to re-stamp the hash. The check runs before decompression, since the hash is over the raw on-wire bytes. For a compressed module that's the compressed payload, exactly what the serializer hashed, so tampering is caught without paying the decompression cost first. - Dependency-hash verification. An independent fingerprint
of the cog-distribution dependency graph (a blake3 over the
concatenated dependency hashes). Lets a build cache,
reproducibility checker, or cog resolver compare two modules'
dep trees in
O(8)without walking the full dependency table. - Per-instruction bytecode validation. Walks every
function's bytecode and rejects:
- Out-of-range function references in
Call/TailCall/CallG/NewClosure. - Call-arity mismatches — every call site's argument count must match the target's declared parameter count.
- Register references past the function's declared register file size.
- Branch offsets (
Jmp/JmpIf/JmpNot/JmpCmp/Switch/TryBegin) that fall outside the function's bytecode region, OR that land mid-instruction in another instruction's operand stream. - Out-of-range constant-pool, string-table, or type-table references.
- Decoder failures mid-stream (a function's bytecode-byte count too small for the encoded instruction sequence).
- Out-of-range function references in
When the validator finds multiple defects, the diagnostic is rendered as a numbered list — full forensic detail rather than a count-only summary.
Use the validated tier whenever bytecode comes from anywhere other
than the compiler in this process: archives loaded from disk,
modules pulled from a cog registry, files passed on the command
line, network-loaded bytecode, IPC-shared modules. The cost is
O(N) in total bytes (hash) + O(M) in instruction count, paid
once at load — the runtime hot path stays gate-free.
Why the architecture matters
The same architectural anti-pattern lived in three places: the bytecode validator, the content hash, and the dependency hash. Each was a security-critical defense implemented as a public field or function with zero production callers — inert until the load-time wiring acted on it. The lesson: a public field carrying a security-critical value with no code path that ACTS on the value is a TODO regardless of how the field is named or documented.
Encoding-layer defenses
The four-pass validated load above operates on already-decoded bytes. Two further defenses live one layer down — at the byte decoder itself:
Varint canonicality (10-byte boundary)
A 10-byte unsigned varint encodes a u64. At byte 9 (shift = 63),
only bit 0 of the 7-bit payload is meaningful — bits 1..6 of
that byte represent positions 64..69 of the conceptual u70, which
are NOT representable in u64. A naive
result |= ((byte & 0x7F) as u64) << 63 would silently drop those
bits via the platform's shift-out-of-range semantics, accepting
adversarial encodings that smuggle invalid u64 values past the
type boundary.
The decoder rejects any 10th-byte value with bits 1..6 set
(byte & 0x7E != 0 at shift == 63) and returns VarIntOverflow.
The legitimate boundary value u64::MAX (encoded as
[0xFF×9, 0x01]) is still accepted — only invalid bit positions
are rejected. Both decoder surfaces (slice-based decode_varint
and Reader-based read_varint) carry the same defense, and the
signed-varint path (decode_signed_varint) inherits it via
decode_varint.
The same canonicality class is enforced at the language level by
core/protobuf/wire.vr::read_varint (Google reference behaviour);
the two implementations share the audit recipe: any decoder where
multiple distinct inputs decode to the same value defeats
round-trip equality and admits forgery.
Hostile-size allocation guards
Several VBC interpreter dispatch handlers (CbgrAlloc in
ffi_extended.rs; GpuAlloc, MallocManaged, GpuMemAlloc,
Free in gpu.rs) construct an allocation Layout from operand
sizes carried in the bytecode. Adversarial bytecode can request
sizes near isize::MAX; Layout::from_size_align correctly
returns an error for such sizes, but the surrounding code used to
either:
- Panic the interpreter via
.unwrap()on a chained fallback (DoS surface), or - Silently downgrade to a
Layout::new::<u8>()1-byte layout while the caller still believed they gotsizebytes — a heap overflow waiting to happen on the first write past byte 0, and on the deallocation path adealloc(ptr, 1-byte-layout)for anN-byte allocation = std::alloc UB.
The fundamental fix:
- Allocation paths return a null pointer on layout failure
(the standard malloc-fail contract; the caller's
Err(e) => ...arm fires). - Deallocation path leaks the buffer on layout failure
rather than calling
deallocwith a wrong layout. The architectural invariant (allocated_buffersonly contains sizes that successfully constructed an alloc layout) makes the failure branch unreachable in practice; leak is the safe-by- default fallback for the impossible case.
The class lesson: in a trust-boundary handler, unwrap_or(default)
is especially dangerous when the default is functionally smaller
than the requested resource — it lies to callers about the
resource and produces a worse failure mode than the true error
case.
Archive memory-amplification bounds
A .vbc archive on disk is a directory of module entries; the
header carries module_count, and each index entry declares
name_len, dep_count, and data_size. Without bounds these
attacker-controlled sizes turn a 32-byte hostile header into a
terabyte allocation request — a memory-amplification denial-of-
service that brings the loader down before the file's actual
contents are even read.
The deserializer enforces architectural upper bounds before any allocation:
MAX_MODULES_PER_ARCHIVE = 65 536— no real-world Verum archive shipped throughcog publishapproaches this. An archive claimingu32::MAXmodules is rejected at the header gate.MAX_MODULE_NAME_BYTES = 16 KB— module names are dotted paths; 16 KB is roughly 1 000 segments.MAX_DEPS_PER_MODULE = 4 096— a single module depending on more than 4 K others is implausible.MAX_MODULE_DATA_BYTES = 1 GB— single-module bytecode beyond 1 GB is structurally suspect and refused.
Each rejection error message names the offending field so triage is immediate.
Module-table memory-amplification bounds
The per-module deserializer carries the same class of defense
one layer down: a hostile module header (loaded directly, not
through an archive) declares four *_count fields —
type_table_count, function_table_count, constant_pool_count,
specialization_table_count — each a u32 used to drive a
Vec::with_capacity(count as usize) allocation before the
deserializer reads a single entry.
A 64-byte hostile module header could request 500 GB-2 TB across
the four tables. The deserializer enforces MAX_*_ENTRIES = 1 048 576 per table, refused before any allocation. Real-world
modules carry at most a few thousand entries in any of these
tables; 1 M is comfortably above any plausible module while
staying far below the wraparound cliff.
A typed TableTooLarge { field, count, max } error names the
offending field for triage.
Descriptor-level memory-amplification bounds
One layer further down: each type / function / specialization
descriptor carries varint-encoded counts (type_params_count,
fields_count, variants_count, protocols_count,
methods_count, …) that drive per-descriptor allocations.
A varint can encode u64::MAX, which casts to usize::MAX on
64-bit platforms — with_capacity(usize::MAX) aborts the
process in most Rust allocators.
Tight architectural bounds (per real-world descriptor surface seen in the embedded stdlib):
MAX_TYPE_PARAMS_PER_DESCRIPTOR = 64— matches theast_to_typerecursion cap that already gates the front-end, so a type the front-end accepted will always fit.MAX_FIELDS_PER_DESCRIPTOR = 4 096MAX_VARIANTS_PER_DESCRIPTOR = 4 096MAX_PROTOCOLS_PER_DESCRIPTOR = 256MAX_METHODS_PER_PROTOCOL_IMPL = 4 096
The same recursion descends into per-type-param / per-variant / per-type-ref counts, each of which has a tighter bound:
MAX_BOUNDS_PER_TYPE_PARAM = 64— protocol bounds on a type parameter (fn f<T: P + Q>).MAX_FIELDS_PER_VARIANT = 1 024— per-variant struct fields (Some { a, b, c }).MAX_TYPE_REF_INSTANTIATION_ARGS = 64— generic instantiation arity (List<Int, String, …>).MAX_FN_TYPE_REF_PARAMS = 256— function-type signature parameter count.MAX_FN_TYPE_REF_CONTEXTS = 32— function-type context list (using [Logger, Database, …]).
Together with the outer-descriptor bounds these close every
varint-driven Vec / SmallVec allocation in the VBC
deserializer.
Bytecode-section bounds
The bytecode section's compressed branches read
uncompressed_size: u32 from the on-wire header and pass it to
the decompressor, which Vec::with_capacity-allocates that many
bytes before reading any compressed input. A hostile section
claiming u32::MAX triggered ~4 GB of allocation. Bound:
MAX_DECOMPRESSED_BYTECODE_BYTES = 1 GB— real Verum modules are kilobytes; the embedded stdlib (everycore/*.vrcompiled) totals ~14 MB.
A separate reliability fix removed an arithmetic underflow
(section_size as usize - 1) that on a section_size == 0
input wrapped to usize::MAX and drove a multi-EB slice
attempt. The bytecode reader now rejects zero-size sections at
entry and uses checked_add throughout.
Implementation surface (Rust)
For compiler / runtime hackers: the trust boundary is exposed as
two parallel constructor families on the verum_vbc crate
(lenient → validated): deserialize_module / deserialize_module_validated,
VbcArchive::load_module / load_module_validated, and
Interpreter::try_new / try_new_validated. Validation failure
surfaces as InterpreterError::ValidationFailed { module_name, reason }; the reason carries the rendered defect list.
Codegen-time validation
Beyond load-time validation, the codegen pipeline runs the same
structural validator against every freshly-built module before
returning it to the caller. The gate is CodegenConfig.validate,
defaulting to true so production builds always get the safety net:
| Field | Default | What it does |
|---|---|---|
validate | true | After build_module() constructs the final VbcModule from accumulated codegen state, finalize_module calls validate::validate_module(&module) (strict mode). Validator failures surface as CodegenError::internal with the module name and the underlying diagnostic embedded in the message. The check is the codegen counterpart to the deserializer's load-time defenses — when codegen has a bug, the failure surfaces here at the codegen call site instead of as an interpreter panic / serializer error several layers later. |
Toggle off via config.validate = false only when the caller has
already validated upstream (e.g., a hot-path incremental rebuild
that re-uses a previously-validated state). The validator runs in
O(module-size) and is well within the codegen budget for typical
modules.
Interpreter
The bytecode interpreter (verum_vbc::interpreter) dispatches via a
table split across 37 handler files in
interpreter/dispatch_table/handlers/, grouped by family:
cbgr.rs+cbgr_helpers.rs— Ref / Deref / tier checks.cubical.rs— 0xDE sub-ops (erased at AOT codegen; implemented for REPL / interpreter).tensor.rs+tensor_extended.rs— tensor opcodes (0xF0–0xF7, 0xFC, 0xFE, 0xFF).method_dispatch.rs— protocol method resolution.async_nursery.rs— 0xA0–0xAF structured concurrency.context.rs— 0xB0–0xB7 capability / context stack.meta.rs— 0xB8–0xBB reflection / quoting.gpu.rs— 0xF8–0xFB GPU dispatch (in the interpreter, these go to a CPU simulator).- …and 29 more per-family handlers for arithmetic, control flow, collections, strings, iterators, autodiff, exceptions, etc.
Execution state lives in interpreter::state — a register file, a
stack, the CBGR heap, and the context stack.
AOT lowering
verum_codegen::llvm::vbc_lowering walks VBC and emits equivalent
LLVM IR. Tier-aware CBGR lowering selects between
CbgrValidated, DirectLoad, and UncheckedLoad strategies based
on the reference's opcode (Ref vs RefChecked vs RefUnsafe).
Cubical opcodes lower to identity / passthrough (proof erasure).
GPU lowering
@device(GPU) functions route through verum_codegen::mlir instead
of LLVM — TensorMatmul, TensorExtended::Attention, etc. lower to
verum.tensor → linalg → gpu dialect → PTX / HSACO / SPIR-V /
Metal. See codegen → MLIR backend.
Proof-carrying code
When @verify(certified) is set, VBC includes proof certificates:
certificate:
obligation: SMT obligation hash
proof_term: serialised proof (size varies)
verifier: "backend-a" | "backend-b" | "portfolio" | "manual"
signed_by: build identity
Downstream consumers can trust the certificate and skip verification, re-check the proof offline, or re-verify with a fresh solver run.
Inspection
Generate a human-readable VBC dump as part of a build:
verum build --emit-vbc
The dump lands in target/{debug,release}/<name>.vbc.txt alongside
the usual artefacts. For per-phase breakdown, add --timings.
See also
- Compilation pipeline — VBC is emitted in Phase 5.
- Codegen — VBC → LLVM / MLIR.
- Runtime tiers — Tier 0 interpretation vs Tier 1 AOT.
- CBGR internals — the 0x70–0x77 opcodes in detail.