Skip to main content

Architecture Overview

Verum is a VBC-first compiler: every program lowers to Verum Bytecode, and VBC is either interpreted (Tier 0) or compiled to native code via LLVM (Tier 1). A separate MLIR path emits GPU binaries for @device(gpu) code. The compiler is a Rust workspace organised into five layers, plus a thin Layer 1.5 of shared protocol-type definitions to break what would otherwise be a circular dependency between the type system and the SMT backend.

Reading paths

Depending on why you're here:

  • Just want to use Verum? Skip this section. Go to language/overview.
  • Curious about the internals? Read this page, then compilation pipeline, then VBC bytecode.
  • Contributing to the compiler? Read this page, then crate map, then the crate whose area you're touching.
  • Debugging a compiler issue? Find the likely crate in the crate map and follow its Key files column.
  • Writing a tool (fuzzer, linter, translator)? Read VBC bytecode — VBC is the stable intermediate.

The big picture

Key crates at a glance

CrateRole
verum_commonSemantic-type primitives (List, Text, Map, …) and shared layout constants.
verum_fast_parserMain recursive-descent parser — direct-to-AST.
verum_astAST node definitions.
verum_syntaxLossless red-green tree for the formatter and IDE.
verum_protocol_typesShared protocol / GAT / CBGR-predicate type definitions (no logic).
verum_typesInference, unification, refinement, cubical, dependent, exhaustiveness.
verum_cbgrReference-tier analysis suite (escape, NLL, Polonius, points-to, SMT-alias, …).
verum_smtCapability-routed SMT layer — portfolio executor with cross-validation between solver adapters.
verum_verificationHoare logic, VCGen, tactic evaluator, dependent verifier, certificate replay.
verum_kernelLCF-style trusted kernel — sole member of the TCB.
verum_coreTyped pipeline IR — the stable contract between AST and kernel.
verum_modulesModule resolution, coherence, parallel loader, cog resolver.
verum_vbcBytecode, interpreter (Tier 0), VBC codegen, monomorphisation, archive format.
verum_codegenLLVM (CPU) + MLIR (GPU) backends.
verum_compilerPhase orchestration, derives, hygiene, embedded stdlib, incremental compiler.
verum_lspLSP 3.17 server.
verum_dapDebug Adapter Protocol server.
verum_interactiveREPL and Playbook TUI.
verum_cliCommand-line frontend (binary verum).

See crate map for every crate with key files and entry points.

Pipeline summary

MIR is not in the main pipeline — it exists only to serve the SMT verifier and advanced optimisation passes. Full phase detail: compilation pipeline.

What's implemented today

Production-ready

  • Bidirectional type inference with dataflow-sensitive narrowing.
  • Refinement types with SMT discharge; @verify(formal|thorough|certified).
  • Dependent types — Π, Σ, path types, computational univalence.
  • Cubical normaliser with HoTT primitives and HITs.
  • Capability-routed SMT layer that classifies obligations by theory signature and picks the best solver adapter; obligations marked high-assurance can be cross-validated by running multiple adapters in parallel.
  • VBC bytecode with primary + extended opcode tables and a dispatch-table interpreter.
  • LLVM AOT codegen with tier-aware CBGR lowering (Ref / RefChecked / RefUnsafe).
  • CBGR memory safety — a multi-module analysis suite (escape, NLL, Polonius, points-to, SMT-alias, ownership, lifetime, concurrency, …) feeding per-reference tier decisions.
  • Module system: 5-level visibility, coherence (orphan + overlap + specialisation), cycle-break strategy ranking, parallel loading.
  • Structured concurrency: async, await, spawn, nursery, work-stealing executor.
  • LSP 3.17 server, DAP debug server, Playbook notebook TUI, REPL.
  • A CLI covering the full project lifecycle (build, run, test, check, lint, fmt, audit, bench, doc, doctor, publish, …).

Newer but validated

  • MLIR GPU path (verum.tensor → linalg → gpu → PTX / HSACO / SPIR-V / Metal) triggered by @device(gpu).
  • Proof-carrying VBC archives with Coq / Lean / Dedukti / Metamath export.
  • Autodiff (VJP) generation for @differentiable functions.
  • Coinductive types with productivity analysis.

Experimental

  • CPU path through MLIR (LLVM remains the default for CPU).
  • Advanced refinement reflection with quantifier instantiation hints.
  • Separation-logic extensions in verum_verification.

What's next

  • Parallel-compilation orchestrator end-to-end (per-phase work stealing).
  • Proof-carrying modules at the cog-distribution boundary.
  • WASM target for the browser playground.
  • Incremental proof replay (edit one function, revalidate only the affected obligations).

See roadmap for the full plan.

Invariants of the system

These invariants hold across every code path and every phase. If you find an exception, it is almost certainly a bug:

1. VBC is the single intermediate

Every source program compiles to VBC. Nothing — not the interpreter, not LLVM, not the verifier — looks at the AST to produce output. Bypassing VBC would fragment semantics and is a hard-banned design direction.

2. Verification is monotone up the ladder

If a function passes @verify(formal), it passes every looser strategy (static, runtime). Upgrading a function's strategy never makes it suddenly valid — only invalid. Callers can safely rely on the tighter guarantees of their callees.

3. CBGR demotions are explicit

The compiler may promote &T to &checked T silently (escape analysis succeeded). It may never demote silently — a tier-2 &unsafe T always requires an unsafe block at the source level.

4. Contexts propagate; they are never ambient

A function's using [...] clause is authoritative. A callee cannot acquire a context the caller didn't provide. A spawned task inherits the parent's context stack by default, but explicit forward (spawn using [...]) drops everything else.

5. No hidden allocation

Every allocation is explicit: Heap(x), Shared.new(x), collections with a with_capacity(n) form, or the arena pool API. The compiler does not insert allocations behind the scenes.

6. Exhaustiveness is checked

Every match is exhaustive. Non-exhaustive patterns are compile errors, not runtime panics. Active patterns are opaque — a catch-all _ => ... is required when they're the only alternatives.

7. Effects are visible in the type

async fn, throws(E), using [Logger, Database, ...] — all effects appear in the function type. A call site can tell exactly what a function does without opening the body. The type system refuses to hide them. (Built-in effects like print / assert / panic don't need a using clause; user-defined contexts from core/context/standard.vr — Logger, Database, Clock, Metrics, RateLimiter — do.)

Data flow across layers

Each arrow is a compiler phase implemented in the corresponding crate. The dashed arrows are parallel passes that feed into the main lowering.

Key design decisions (and why)

Why VBC as the stable IR?

A stable bytecode gives:

  • A single lowering from source to execution; no fork between interpreted and compiled paths.
  • A tooling surface for inspectors, disassemblers, fuzzers, and cross-crate caches.
  • Proof-carrying distribution — cogs ship as .cog archives containing VBC plus optional proof certificates; validators can recheck without re-parsing Verum source.

Why capability-routed SMT?

Different solver implementations have different strengths — some are stronger on linear arithmetic and quantifier-free fragments, others on strings, bitvectors with interpretation, finite-model finding, or specific theory combinations. The capability router classifies each obligation by its theory signature and dispatches to the adapter best suited to it. The portfolio executor can also cross-validate by running multiple adapters in parallel for high-assurance obligations. The interface is solver-agnostic so adapters can be swapped without touching the verification pipeline.

Why three CBGR tiers?

A single tier forces a trade-off: either pay the 15 ns per-deref (Rust-style lifetimes + runtime checks) or lean on the programmer (raw pointers). Three tiers let the compiler promote automatically where safe, ask the programmer where it can't prove safety, and charge for safety only where it's actually needed.

Why unified verum_compiler phase orchestrator?

Phases have non-trivial dependencies — CBGR needs types but also narrowed types from guards; macro expansion can produce new types that restart inference. A single orchestrator with a declarative phase DAG is easier to reason about than per-crate phase implementations.

Documents in this section