Entropy Jurisprudence: Auditing Procedural Fidelity in LLM Normative Reasoning

Published: 02 Mar 2026, Last Modified: 21 Mar 2026Agentic AI in the Wild: From Hallucinations to Reliable Autonomy PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: procedural fidelity, normative reasoning, LLM reliability, rule execution, scale hallucination, agent safety
TL;DR: LLMs that reach similar outcomes can differ substantially in procedural fidelity, making rule-execution failures visible that outcome-only evaluation can miss.
Abstract: Outcome-level evaluation does not test whether models execute an explicit rule they have accepted. We introduce Entropy Jurisprudence, a procedural audit for rule-constrained normative judgments using the auditable commitment $E = H \times R$. In a main study of 720 trials on six models and four boundary-stress cases, we recover three recurring regimes: instruction-faithful, prior-dominant, and context-sensitive behavior. We then add 2,400 supplemental trials to address the main objections to the original setup: explicit midpoint coverage for $R = 1.0$, prompt-template sensitivity, and reasoning-format ablations. The extended study shows that midpoint cases are the hardest region of the scale (mid-band expected-$R$ match ranges from 20.3\% for Qwen3 to 71.2\% for Gemma3), that prompt wording can shift execution rates by up to 72.5 percentage points, and that numbered-step formats substantially improve stability for some models (e.g., Llama3: 82.5\% execution vs. 22.5\% under explicit chain-of-thought). These results support a narrower but stronger claim: procedural fidelity is measurable, model-dependent, and materially affected by problem framing. Entropy Jurisprudence is therefore best understood as a lightweight audit for rule-execution stability, not as a standalone moral benchmark.
Submission Number: 1
Loading