Improving Hallucination Detection in Dialog via Social Framing Analysis
Keywords: hallucination detection, dialogue systems, social framing, calibration, LLM evaluation
TL;DR: Replacing human speaker labels with AI identifiers in dialog hallucination detection improves calibration by 53% and helps in knowledge-grounded domains, but hurts coherence tracking in chit-chat.
Abstract: Hallucination detection in dialogue is harder than in single-turn settings due to speaker identity, multi-turn context, and conversational framing. We hypothesize that social framing drives much of this difficulty, building on prior work showing that human-vs-AI speaker attribution shifts LLM factual judgments by 17.7pp. We evaluate a dehumanization intervention (replacing speaker labels with AI identifiers) on the DiaHalu benchmark (N=1,099) using GPT-5 Nano. While the overall effect is not statistically significant (McNemar p=.149), domain-level analysis reveals that dehumanization improves every metric in knowledge-grounded dialog (+2.4 F1, +2.5 Acc) while introducing tradeoffs in chit-chat where speaker identity is needed for coherence tracking. The clearest gain is in calibration: expected calibration error halves from .027 to .013, with the largest improvement at low confidence (+29pp). Baseline confidence also predicts which samples are vulnerable to framing effects, with flipped samples showing 3-8pp lower confidence than stable ones across all domains.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 186
Loading