From Prompts to Proof Obligations: Formal Sidecars as an Epistemic Interface for Trustworthy ML
Keywords: trustworthy machine learning, autoformalization, formal sidecars, proof assistants, Lean 4, LLM verification, evidence grounding, empirical reasoning, source grounding, abstention, epistemology of AI, AI governance
TL;DR: LLM outputs should carry formal sidecars: typed, machine-checkable commitments that expose target claims, evidence boundaries, proof obligations, and abstention rules. EG-VAR shows this for empirical table claims in Lean 4.
Abstract: Large language models make it increasingly cheap to attach formal,
machine-checkable refinements to ordinary natural-language claims. This paper
argues that such refinements should be treated not merely as technical aids,
but as a new epistemic interface for trustworthy ML. The relevant artifact is
no longer prose alone, but prose plus a formal sidecar: an explicit object
specifying what proposition the answer commits to, what source or discourse
scope it assumes, what evidence boundary it respects, what proof or checking
obligation remains, and when the system must abstain. Using EG-VAR (Evidence-Grounded Verified Agentic
Reasoning) as a case study, we show how empirical claims over tables (with
the same pattern extending to typed APIs and structured sources) acquire
exactly these five elements: a target proposition, a source/discourse scope,
an evidence boundary, a checked proof obligation, and an explicit abstention
condition. The philosophical point is not that natural language has a
unique hidden logical form, or that proof assistants solve meaning. It is that
cheap autoformalization turns interpretation into an auditable design object.
Autoformalization therefore shifts part of the trustworthy-ML problem from
model outputs to governed communicative artifacts.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 74
Loading