Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes

TMLR Paper5771 Authors

29 Aug 2025 (modified: 13 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We evaluate AI systems without ground truth by exploiting a link between strategic gaming and information loss. Building on established information theory, we analyze which mechanisms resist adversarial manipulation. We extend finite-sample bounds to show that certain f-divergences (e.g., total variation distance) maintain polynomial guarantees under attacks while other measures (e.g., KL divergence) degrade exponentially. We implement these mechanisms by modeling the overseer as an agent and characterize incentive-compatible scoring rules as f-information objectives. Under adversarial attacks, TVD-MI maintains effectiveness (area under curve 0.70-0.77) while other approaches can decay towards chance, demonstrating that when we query the same LLM for information relationships rather than quality judgments, we achieve both theoretical and practical robustness. The mechanisms decompose pairwise evaluations into reliable item-level quality scores without ground truth, addressing a key limitation of standard peer prediction. \textit{Note: Supplementary material including pre-registration details and experimental code is provided in the submission package.}
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Nishant_A_Mehta1
Submission Number: 5771
Loading