Keywords: scalable oversight, procedural fidelity, AGI safety, alignment verification, normative reasoning, automated auditing, trust calibration
TL;DR: We propose an auditing framework revealing that LLMs fabricate parameters 30,000× beyond allowed bounds to justify conclusions—demonstrating why scalable oversight requires procedural verification, not just outcome evaluation, for AGI-level systems.
Abstract: As AI systems approach AGI-level capabilities, scalable oversight becomes critical: how do we verify that powerful systems follow intended rules rather than rationalize predetermined conclusions? We introduce Entropy Jurisprudence, a procedural audit framework addressing this challenge. Using a minimal normative formula (E = H × R, where R ∈ {0.1, 1.0, 2.0}), we test whether LLMs faithfully execute explicit constraints or manipulate parameters to justify prior beliefs. Across 720 trials on six models, we discover systematic "scale hallucinations"—models inventing values like R = 30,002 when only three options exist. Critically, all models achieve identical outcome accuracy (50%) while exhibiting dramatically different procedural fidelity, with Rationalization Index values ranging from near-zero to over 19,000. This reveals a fundamental tension for AGI alignment: outcome-based evaluation cannot detect procedural infidelity. For human-AI collaboration, this poses a trust calibration problem—humans cannot distinguish genuine reasoning from sophisticated rationalization. Our framework provides a minimal, scalable methodology for auditing whether AI systems "follow the rules" or "rewrite them," contributing to the technical foundations needed for meaningful oversight of AGI-level systems.
Submission Number: 1
Loading