Who's Manipulating Whom? Epistemic Grounding to Break Recursive Validation Loops in Large Language Models
Keywords: Epistemic Grounding, Recursive Validation Loops, Amplification Cascading, Architectural Reconstruction, Systematic Evidence Evaluation
TL;DR: LLMs risk epistemic hollowing and over conformity, producing ungrounded outputs prone to manipulation. We propose Epistemic Grounding to break recursive feedback loops and anchor AI in truth.
Abstract: Large Language Models optimized for helpfulness through Reinforcement Learning from Human Feedback (RLHF) can exhibit systematic vulnerabilities to epistemic manipulation. We investigate this through controlled machine-to-machine negotiations (n=49) where AI agents assume buyer/seller roles with asymmetric information. Our analysis reveals three interaction patterns: fair competition, achieving 99.1% efficiency relative to the Nash equilibrium, systematic manipulation, creating 71% profit advantages, and cooperative truth-seeking with 100% success rates. We observe systematic failures where models violate optimization directives in 16% of cases, indicating that alignment training can override rational behavior under strategic pressure. Model selection emerges as more impactful than strategy optimization, with reliability differences accounting for 60\% of outcome variance. We propose Epistemic Grounding as a framework to improve AI system reliability through model tiering, verification protocols, and training objective modifications. Our findings suggest careful model selection and epistemic safeguards are essential for deploying AI in high-stakes strategic interactions.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 24296
Loading