The eval that passed in CI shares its schema with the run that breaks.
No translation layer. CI cases use the same entities, edges, and recipes as a live discharge. A regression is a new red edge — not a percentage point.
Before
70% pass · no idea which discharges failed
→
With Invariance
Graph diff · 2 new red edges, both attending-skipped
Walkthrough
How a run becomes a guardrail.
01
Freeze a run as a case
Any production run — discharge hold, med-recon, OR scheduling — promotes to a frozen dataset entry, with its inputs, expected entities, and expected edges.
02
Run from CLI, SDK, or UI
`inv eval run compliance-v2 --on run_hold_001`. Tagged runs sit in the same list as production.
03
Diff graphs across versions
Cell-by-cell: new findings, lost findings, new VIOLATED edges. Regression is visible, not statistical.
Built on these surfaces