Skip to main content

Differential Lean Checker

Status: load-bearing CI gate. The 24-cert canonical battery is run through the trusted-base kernel and the Lean ReferenceChecker on every release; cert-by-cert verdict disagreement fails the audit.

1. Why this gate exists

The kernel-soundness export pipeline (--kernel-soundness) verifies that the theorem statements of the 38 kernel rules typecheck. The tri-prover replay (--external-prover-replay) verifies that those theorem statements typecheck across four independent foundations (Lean 4, Coq / Rocq, Isabelle / HOL, Cubical Agda).

Neither of those checks ever asks the question:

Given the same certificate, do the trusted-base kernel and an independent re-implementation of the same rules produce the same verdict?

A bug in either kernel that doesn't surface as a meta-theory shape mismatch — a wrong de Bruijn shift, a missed eta case, an incorrect substitution, an off-by-one in universe arithmetic — is exactly the kind of bug the meta-theory checker can't catch. The load-bearing fix is to run the same certificate through both implementations and compare verdicts.

That's --differential-lean-checker. See §5 for the representative bug class this design pattern surfaces.

2. The two kernels

KernelImplementation
Trusted-baseThe kernel's bidirectional type checker over a de Bruijn Term calculus (crates/verum_kernel/src/proof_checker.rs). The verdict authority.
Lean referenceIndependent re-implementation in Lean 4 (verification/external/lean/VerumExternalReplay/ReferenceChecker.lean) over a structurally-identical Term ADT. Mirrors all four DEFECT fixes (η-completeness, universe-overflow guard, whnf fuel ceiling, claimed-type-is-a-type validation).

The two implementations were authored separately against the same abstract specification (the typing rules in docs/research/proof-tree-formalization.md) — they are intended to disagree only when one of them has a bug.

3. The 24-cert canonical battery

The battery (canonical_battery in the kernel crate) is hand-crafted to exercise every load-bearing kernel pathway and lives in the kernel crate as a single source of truth: the same 24 certs flow through this gate (Rust↔Lean cross-language) and through the in-process N-kernel gate (three-kernel-differential). Editing the battery happens in one place; both gates pick up the new coverage automatically.

It covers four orthogonal axes:

Structural fragment (the parts both kernels must accept)

Cert IDTermClaimed typeWhy
univ-0-in-1Universe(0)Universe(1)T-Univ at lowest universe
univ-3-in-4Universe(3)Universe(4)T-Univ mid-tower
id-at-univ0λ.0Π Universe(0). Universe(0)Identity at type 0
id-at-univ1λ.0Π Universe(1). Universe(1)Identity at type 1
polymorphic-id-shapeλ. λ. 0Π Universe(0). Π 0. 0Polymorphic identity
pi-form-univ0Π Universe(0). Universe(0)Universe(1)T-Pi-Form
app-elim-correct(λ.0) Universe(0)Universe(0)T-App-Elim with correct arg
nested-binders-deep44-deep nested λmatching Π chainDeep de Bruijn shift
const-functionλ. λ. 1const-function typeVariable shadowing
eta-redex-via-app(λ.0) (λ.0) at Π U(0).U(0)λ.0 shapeη-equivalence

Negative cases (both kernels must reject)

Cert IDWhy rejected
univ-mismatchT-Univ violated: Universe(0) : Universe(2) (skips a level)
app-on-non-functionT-App applied to a non-function
app-elim-domain-mismatchT-App-Elim with wrong domain type
unbound-var-deepde Bruijn index past Γ length
claimed-not-a-typeclaimed_type doesn't itself have a universe kind

Boundary-pin certs (regression battery)

Cert IDWhat it pins
defect-1-mirrorsubstitution capture in dependent codomain
defect-2-univ-max-overflowsuniverse overflow at u32::MAX
defect-2-univ-max-minus-one-oktower-top boundary: MAX-1 : MAX accepted
defect-3-mirrorPi-introduction without binder-type kind check
defect-4-claimed-is-valueclaimed_type validation: must itself be a type
defect-4-claimed-is-pi-of-valueclaimed_type validation, nested case

Polymorphic / cross-instantiation shapes

Cert IDWhy
polymorphic-id-appliedidentity instantiated at Universe(0)
polymorphic-id-instantiated-at-univ1identity instantiated at Universe(1)
nested-app-pipelineApp nested 3-deep through structural fragment

Total: 24 certificates. The battery is closed under regression — any new defect found against the kernel adds a defect-N-mirror cert.

4. The protocol

┌──────────────────────────────────────────────────────────┐
│ verum audit --differential-lean-checker │
└────────────────────┬─────────────────────────────────────┘


┌──────────────────────────────────┐
│ load_differential_battery() │
│ → 24 BatteryCert structs │
└────────────────────┬─────────────┘

┌────────────────┴────────────────┐
│ │
▼ ▼
┌──────────────────┐ ┌────────────────────┐
│ trusted-base kernel │ │ Serialise battery │
│ │ │ → JSON file │
│ Certificate:: │ └─────────┬──────────┘
│ verify(cert) │ │
│ for each cert │ ▼
│ │ ┌────────────────────┐
│ → Vec<Verdict> │ │ lake build │
└────────┬─────────┘ │ verum_replay_ │
│ │ checker (Lean exe)│
│ └─────────┬──────────┘
│ │
│ ▼
│ ┌────────────────────┐
│ │ Run Lean exe │
│ │ binary path: │
│ │ .lake/build/ │
│ │ bin/verum_ │
│ │ replay_checker │
│ │ → JSON verdicts │
│ └─────────┬──────────┘
│ │
└─────────────────┬───────────────┘

┌──────────────────────┐
│ Cross-check │
│ cert-by-cert ok │
│ field parity │
└──────────┬───────────┘


┌──────────────────────────┐
│ Total: 24, Agreements: │
│ 24, Disagreements: 0 │
└──────────────────────────┘

The Lean executable (verum_replay_checker) is a Lake target defined in verification/external/lean/lakefile.toml. The Rust audit harness builds it on demand (cached after the first build), then invokes it with the battery JSON path as argv[0]. Output is read from stdout, parsed, and cross-compared against the Rust verdicts.

Verdicts compare by the ok : Bool field only — error tags are stored for diagnostic reading but not required to match between implementations (different error categories are an implementation choice, not a correctness claim).

5. Bug class this gate is designed to catch

The differential pattern catches algorithmic divergences visible only when two structurally-different kernels disagree on the same input. A representative case in the regression battery: the universe-tower-top boundary. When the kernel's certificate verifier infers the kind of claimed_type, a Universe(u32::MAX) triggers UniverseOverflow on its successor — even though the claimed_type is a valid type at the top of the universe tower. Lean's Nat is unbounded, so a naive Lean implementation accepts the cert while the trusted-base kernel correctly rejects.

Verum's resolution: certificate verification treats UniverseOverflow from the kind-check step as "claimed_type lives at the top of the universe tower — still a type." The subsequent definitional-equality check on inferred-vs-claimed catches any genuine type mismatch downstream. The boundary cert (defect-2-univ-max-minus-one-ok) is permanently part of the regression battery.

6. Running locally

# Default — runs trusted-base kernel + Lean exe, reports per-cert verdicts.
verum audit --differential-lean-checker

# JSON output for machine consumption. Battery + Rust verdicts
# saved to target/audit-reports/differential-lean/battery.json;
# Lean verdicts and final verdict to stdout.
verum audit --differential-lean-checker --format json

Prerequisites:

7. Wiring in CI

The gate is part of verum audit --bundle (the umbrella audit). Add to the GitHub Actions matrix alongside the tri-prover replay:

- name: Differential Lean checker
run: |
# elan + lake (required for both gates)
curl -sSf https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh \
| sh -s -- -y --default-toolchain leanprover/lean4:v4.29.1
source $HOME/.elan/env
verum audit --differential-lean-checker

CI failure mode: if any cert disagrees, the harness exits non-zero and prints the disagreeing cert IDs alongside both verdicts. The debug investigation pattern is:

  1. Read the cert's term and claimed_type from the battery JSON.
  2. Step both implementations (the kernel via cargo test -p verum_kernel proof_checker -- --nocapture; Lean via lake env lean REPL on ReferenceChecker.lean).
  3. The kernel that produces the wrong verdict per the typing rules of docs/research/proof-tree-formalization.md is the one with the bug.

8. Maintenance

When adding a new cert to the battery:

  1. Append a BatteryCert to load_differential_battery() in crates/verum_cli/src/commands/audit.rs. Use a stable ID for future regression-bibliography reference.
  2. Run verum audit --differential-lean-checker. The new cert must reach 24+1 unanimous agreement before merge.

When finding a kernel disagreement (i.e., the gate failed):

  1. Add a defect-N-mirror cert to the battery that captures the minimal failing shape.
  2. Fix the kernel(s). Document the defect in the audit ledger (docs/architecture/verum-kernel-audit.md).
  3. Confirm the cert flips to unanimous agreement.

9. Cross-references