Skip to main content

Architecture Overview

Verum is a VBC-first compiler: every program lowers to Verum Bytecode, and VBC is either interpreted (Tier 0) or compiled to native code via LLVM (Tier 1). A separate MLIR path emits GPU binaries for @device(gpu) code. The compiler is a 24-crate Rust workspace (~1.36 M LOC) organised into five layers.

Reading paths

Depending on why you're here:

  • Just want to use Verum? Skip this section. Go to language/overview.
  • Curious about the internals? Read this page, then compilation pipeline, then VBC bytecode.
  • Contributing to the compiler? Read this page, then crate map, then the crate whose area you're touching.
  • Debugging a compiler issue? Find the likely crate in the crate map and follow its Key files column.
  • Writing a tool (fuzzer, linter, translator)? Read VBC bytecode — VBC is the stable intermediate.

The big picture

Key crates at a glance

Line counts reflect the current release.

CrateRoleLOC
verum_typesType system (inference, refinement, cubical)221 K
verum_vbcBytecode, interpreter, VBC codegen, monomorphization192 K
verum_compilerPhase orchestration, derives, linker config161 K
verum_smtSMT backend, capability router, portfolio executor139 K
verum_cbgr11-module reference-tier analysis suite103 K
verum_fast_parserMain recursive-descent parser89 K
verum_codegenLLVM (CPU) + MLIR (GPU) backends81 K
verum_verificationVCGen, Hoare logic, tactic evaluator59 K
verum_parserLegacy parser (partial, being phased out)49 K
verum_astAST definitions47 K
verum_lspLanguage server (LSP 3.17)33 K
verum_cliCommand-line frontend (35 commands)32 K

See crate map for every crate with LOC and key files.

Pipeline summary

MIR is not in the main pipeline — it exists only to serve the SMT verifier and advanced optimisation passes. Full phase detail: compilation pipeline.

What's implemented today

Production-ready

  • Bidirectional type inference with dataflow-sensitive narrowing.
  • Refinement types with SMT discharge; @verify(formal|thorough|certified).
  • Dependent types — Π, Σ, path types, computational univalence.
  • Cubical normaliser with HoTT primitives and HITs.
  • Dual-backend SMT with a capability router that classifies obligations by theory and picks the appropriate solver.
  • VBC bytecode with ~350 opcodes (primary + extended tables) and a 37-file dispatch-table interpreter.
  • LLVM AOT codegen with tier-aware CBGR lowering (Ref / RefChecked / RefUnsafe).
  • CBGR memory safety — 11 analysis modules (escape, NLL, Polonius, points-to, SMT-alias, …) feeding per-reference tier decisions.
  • Module system: 5-level visibility, coherence (orphan + overlap + specialisation), cycle-break strategy ranking, parallel loading.
  • Structured concurrency: async, await, spawn, nursery, work-stealing executor.
  • LSP 3.17 server, DAP debug server, Playbook notebook TUI, REPL.
  • 35 CLI commands covering the full project lifecycle.

Newer but validated

  • MLIR GPU path (verum.tensor → linalg → gpu → PTX / HSACO / SPIR-V / Metal) triggered by @device(gpu).
  • Proof-carrying VBC archives with Coq / Lean / Dedukti / Metamath export.
  • Autodiff (VJP) generation for @differentiable functions.
  • Coinductive types with productivity analysis.

Experimental

  • CPU path through MLIR (LLVM remains the default for CPU).
  • Advanced refinement reflection with quantifier instantiation hints.
  • Separation-logic extensions in verum_verification.

What's next

  • Parallel-compilation orchestrator end-to-end (per-phase work stealing).
  • Proof-carrying modules at the cog-distribution boundary.
  • WASM target for the browser playground.
  • Incremental proof replay (edit one function, revalidate only the affected obligations).

See roadmap for the full plan.

Invariants of the system

These invariants hold across every code path and every phase. If you find an exception, it is almost certainly a bug:

1. VBC is the single intermediate

Every source program compiles to VBC. Nothing — not the interpreter, not LLVM, not the verifier — looks at the AST to produce output. Bypassing VBC would fragment semantics and is a hard-banned design direction.

2. Verification is monotone up the ladder

If a function passes @verify(formal), it passes every looser strategy (static, runtime). Upgrading a function's strategy never makes it suddenly valid — only invalid. Callers can safely rely on the tighter guarantees of their callees.

3. CBGR demotions are explicit

The compiler may promote &T to &checked T silently (escape analysis succeeded). It may never demote silently — a tier-2 &unsafe T always requires an unsafe block at the source level.

4. Contexts propagate; they are never ambient

A function's using [...] clause is authoritative. A callee cannot acquire a context the caller didn't provide. A spawned task inherits the parent's context stack by default, but explicit forward (spawn using [...]) drops everything else.

5. No hidden allocation

Every allocation is explicit: Heap(x), Shared.new(x), collections with a with_capacity(n) form, or the arena pool API. The compiler does not insert allocations behind the scenes.

6. Exhaustiveness is checked

Every match is exhaustive. Non-exhaustive patterns are compile errors, not runtime panics. Active patterns are opaque — a catch-all _ => ... is required when they're the only alternatives.

7. Effects are visible in the type

async fn, throws(E), using [IO] — all effects appear in the function type. A call site can tell exactly what a function does without opening the body. The type system refuses to hide them.

Data flow across layers

Each arrow is a compiler phase implemented in the corresponding crate. The dashed arrows are parallel passes that feed into the main lowering.

Key design decisions (and why)

Why VBC as the stable IR?

A stable bytecode gives:

  • A single lowering from source to execution; no fork between interpreted and compiled paths.
  • A tooling surface for inspectors, disassemblers, fuzzers, and cross-crate caches.
  • Proof-carrying distribution — cogs ship as .cog archives containing VBC plus optional proof certificates; validators can recheck without re-parsing Verum source.

Why dual-backend SMT?

Z3 excels at linear arithmetic, arrays, and quantifier-free fragments. CVC5 excels at strings, bitvectors with interpretation, and theory combinations. The capability router classifies each obligation's theory and dispatches — better coverage than either solver alone.

Why three CBGR tiers?

A single tier forces a trade-off: either pay the 15 ns per-deref (Rust-style lifetimes + runtime checks) or lean on the programmer (raw pointers). Three tiers let the compiler promote automatically where safe, ask the programmer where it can't prove safety, and charge for safety only where it's actually needed.

Why unified verum_compiler phase orchestrator?

Phases have non-trivial dependencies — CBGR needs types but also narrowed types from guards; macro expansion can produce new types that restart inference. A single orchestrator with a declarative phase DAG is easier to reason about than per-crate phase implementations.

Documents in this section