Differential Lean Checker
Status: load-bearing CI gate. The 24-cert canonical battery is run through the trusted-base kernel and the Lean ReferenceChecker on every release; cert-by-cert verdict disagreement fails the audit.
1. Why this gate exists
The kernel-soundness export pipeline (--kernel-soundness)
verifies that the theorem statements of the 38 kernel rules
typecheck. The tri-prover replay
(--external-prover-replay)
verifies that those theorem statements typecheck across four
independent foundations (Lean 4, Coq / Rocq, Isabelle / HOL,
Cubical Agda).
Neither of those checks ever asks the question:
Given the same certificate, do the trusted-base kernel and an independent re-implementation of the same rules produce the same verdict?
A bug in either kernel that doesn't surface as a meta-theory shape mismatch — a wrong de Bruijn shift, a missed eta case, an incorrect substitution, an off-by-one in universe arithmetic — is exactly the kind of bug the meta-theory checker can't catch. The load-bearing fix is to run the same certificate through both implementations and compare verdicts.
That's --differential-lean-checker. See
§5 for the
representative bug class this design pattern surfaces.
2. The two kernels
| Kernel | Implementation |
|---|---|
| Trusted-base | The kernel's bidirectional type checker over a de Bruijn Term calculus (crates/verum_kernel/src/proof_checker.rs). The verdict authority. |
| Lean reference | Independent re-implementation in Lean 4 (verification/external/lean/VerumExternalReplay/ReferenceChecker.lean) over a structurally-identical Term ADT. Mirrors all four DEFECT fixes (η-completeness, universe-overflow guard, whnf fuel ceiling, claimed-type-is-a-type validation). |
The two implementations were authored separately against the same
abstract specification (the typing rules in
docs/research/proof-tree-formalization.md) — they are intended
to disagree only when one of them has a bug.
3. The 24-cert canonical battery
The battery
(canonical_battery in the kernel crate)
is hand-crafted to exercise every load-bearing kernel pathway and
lives in the kernel crate as a single source of truth: the same
24 certs flow through this gate (Rust↔Lean cross-language) and
through the in-process N-kernel gate
(three-kernel-differential).
Editing the battery happens in one place; both gates pick up the
new coverage automatically.
It covers four orthogonal axes:
Structural fragment (the parts both kernels must accept)
| Cert ID | Term | Claimed type | Why |
|---|---|---|---|
univ-0-in-1 | Universe(0) | Universe(1) | T-Univ at lowest universe |
univ-3-in-4 | Universe(3) | Universe(4) | T-Univ mid-tower |
id-at-univ0 | λ.0 | Π Universe(0). Universe(0) | Identity at type 0 |
id-at-univ1 | λ.0 | Π Universe(1). Universe(1) | Identity at type 1 |
polymorphic-id-shape | λ. λ. 0 | Π Universe(0). Π 0. 0 | Polymorphic identity |
pi-form-univ0 | Π Universe(0). Universe(0) | Universe(1) | T-Pi-Form |
app-elim-correct | (λ.0) Universe(0) | Universe(0) | T-App-Elim with correct arg |
nested-binders-deep4 | 4-deep nested λ | matching Π chain | Deep de Bruijn shift |
const-function | λ. λ. 1 | const-function type | Variable shadowing |
eta-redex-via-app | (λ.0) (λ.0) at Π U(0).U(0) | λ.0 shape | η-equivalence |
Negative cases (both kernels must reject)
| Cert ID | Why rejected |
|---|---|
univ-mismatch | T-Univ violated: Universe(0) : Universe(2) (skips a level) |
app-on-non-function | T-App applied to a non-function |
app-elim-domain-mismatch | T-App-Elim with wrong domain type |
unbound-var-deep | de Bruijn index past Γ length |
claimed-not-a-type | claimed_type doesn't itself have a universe kind |
Boundary-pin certs (regression battery)
| Cert ID | What it pins |
|---|---|
defect-1-mirror | substitution capture in dependent codomain |
defect-2-univ-max-overflows | universe overflow at u32::MAX |
defect-2-univ-max-minus-one-ok | tower-top boundary: MAX-1 : MAX accepted |
defect-3-mirror | Pi-introduction without binder-type kind check |
defect-4-claimed-is-value | claimed_type validation: must itself be a type |
defect-4-claimed-is-pi-of-value | claimed_type validation, nested case |
Polymorphic / cross-instantiation shapes
| Cert ID | Why |
|---|---|
polymorphic-id-applied | identity instantiated at Universe(0) |
polymorphic-id-instantiated-at-univ1 | identity instantiated at Universe(1) |
nested-app-pipeline | App nested 3-deep through structural fragment |
Total: 24 certificates. The battery is closed under regression — any
new defect found against the kernel adds a defect-N-mirror cert.
4. The protocol
┌──────────────────────────────────────────────────────────┐
│ verum audit --differential-lean-checker │
└────────────────────┬─────────────────────────────────────┘
│
▼
┌──────────────────────────────────┐
│ load_differential_battery() │
│ → 24 BatteryCert structs │
└────────────────────┬─────────────┘
│
┌────────────────┴────────────────┐
│ │
▼ ▼
┌──────────────────┐ ┌────────────────────┐
│ trusted-base kernel │ │ Serialise battery │
│ │ │ → JSON file │
│ Certificate:: │ └─────────┬──────────┘
│ verify(cert) │ │
│ for each cert │ ▼
│ │ ┌────────────────────┐
│ → Vec<Verdict> │ │ lake build │
└────────┬─────────┘ │ verum_replay_ │
│ │ checker (Lean exe)│
│ └─────────┬──────────┘
│ │
│ ▼
│ ┌────────────────────┐
│ │ Run Lean exe │
│ │ binary path: │
│ │ .lake/build/ │
│ │ bin/verum_ │
│ │ replay_checker │
│ │ → JSON verdicts │
│ └─────────┬──────────┘
│ │
└─────────────────┬───────────────┘
▼
┌──────────────────────┐
│ Cross-check │
│ cert-by-cert ok │
│ field parity │
└──────────┬───────────┘
│
▼
┌──────────────────────────┐
│ Total: 24, Agreements: │
│ 24, Disagreements: 0 │
└──────────────────────────┘
The Lean executable (verum_replay_checker) is a Lake target
defined in verification/external/lean/lakefile.toml. The Rust
audit harness builds it on demand (cached after the first build),
then invokes it with the battery JSON path as argv[0]. Output is
read from stdout, parsed, and cross-compared against the Rust
verdicts.
Verdicts compare by the ok : Bool field only — error tags are
stored for diagnostic reading but not required to match
between implementations (different error categories are an
implementation choice, not a correctness claim).
5. Bug class this gate is designed to catch
The differential pattern catches algorithmic divergences visible
only when two structurally-different kernels disagree on the same
input. A representative case in the regression battery: the
universe-tower-top boundary. When the kernel's certificate
verifier infers the kind of claimed_type, a Universe(u32::MAX)
triggers UniverseOverflow on its successor — even though the
claimed_type is a valid type at the top of the universe tower.
Lean's Nat is unbounded, so a naive Lean implementation accepts
the cert while the trusted-base kernel correctly rejects.
Verum's resolution: certificate verification treats
UniverseOverflow from the kind-check step as "claimed_type
lives at the top of the universe tower — still a type." The
subsequent definitional-equality check on inferred-vs-claimed
catches any genuine type mismatch downstream. The boundary cert
(defect-2-univ-max-minus-one-ok) is permanently part of the
regression battery.
6. Running locally
# Default — runs trusted-base kernel + Lean exe, reports per-cert verdicts.
verum audit --differential-lean-checker
# JSON output for machine consumption. Battery + Rust verdicts
# saved to target/audit-reports/differential-lean/battery.json;
# Lean verdicts and final verdict to stdout.
verum audit --differential-lean-checker --format json
Prerequisites:
- Lean 4 via elan: https://leanprover.github.io/get_started/
- The Lake build step is incremental — first run is ~30s, repeat runs are ~1s.
7. Wiring in CI
The gate is part of verum audit --bundle (the umbrella audit).
Add to the GitHub Actions matrix alongside the tri-prover replay:
- name: Differential Lean checker
run: |
# elan + lake (required for both gates)
curl -sSf https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh \
| sh -s -- -y --default-toolchain leanprover/lean4:v4.29.1
source $HOME/.elan/env
verum audit --differential-lean-checker
CI failure mode: if any cert disagrees, the harness exits non-zero and prints the disagreeing cert IDs alongside both verdicts. The debug investigation pattern is:
- Read the cert's
termandclaimed_typefrom the battery JSON. - Step both implementations (the kernel via
cargo test -p verum_kernel proof_checker -- --nocapture; Lean vialake env leanREPL onReferenceChecker.lean). - The kernel that produces the wrong verdict per the typing
rules of
docs/research/proof-tree-formalization.mdis the one with the bug.
8. Maintenance
When adding a new cert to the battery:
- Append a
BatteryCerttoload_differential_battery()incrates/verum_cli/src/commands/audit.rs. Use a stable ID for future regression-bibliography reference. - Run
verum audit --differential-lean-checker. The new cert must reach 24+1 unanimous agreement before merge.
When finding a kernel disagreement (i.e., the gate failed):
- Add a
defect-N-mirrorcert to the battery that captures the minimal failing shape. - Fix the kernel(s). Document the defect in the audit ledger
(
docs/architecture/verum-kernel-audit.md). - Confirm the cert flips to unanimous agreement.
9. Cross-references
- External-Prover Verification — the complementary gate that checks theorem statements across four foundations (vs this page checks runtime verdicts in two implementations).
- Trusted Kernel — the TCB this gate is one layer of differential second-opinion on.
verum auditCLI surface — full audit-flag table.