Hallucination Detox: Sensitivity Dropout  (SenD) for Large Language Model Training

Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training

ACL ARR 2025 February Submission5125 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As large language models (LLMs) become increasingly prevalent, concerns about their reliability, particularly due to hallucinations - factually inaccurate or irrelevant outputs - have grown. Our research investigates the relationship between the uncertainty in training processes and the emergence of hallucinations. Using models from the Pythia suite and several hallucination detection metrics, we analyze hallucination trends and identify significant variance during training. To address this, we propose Sensitivity Dropout (SenD), a novel training protocol designed to reduce hallucination variance during training by deterministically dropping embedding indices with significant variability. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore in 2x speed. This efficient metric is integrated into our training protocol, allowing SenD to be both computationally scalable and effective at reducing hallucination variance through training. SenD improves test-time reliability by up to 17\% and enhances factual accuracy in domains such as Wikipedia, Medical, LegalBench, and CodeSearchNet without affecting downstream task performance.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: LLMs, Hallucinations, Dropout, Reliability, Efficiency

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency, Theory

Languages Studied: English

Submission Number: 5125

Loading