InterpDetect: Interpretable Signals for Detecting Hallucinations in Financial Question Answering

Likun Tan; Kuan-Wei Huang; Joy Shi; Kevin Wu

InterpDetect: Interpretable Signals for Detecting Hallucinations in Financial Question Answering

Likun Tan, Kuan-Wei Huang, Joy Shi, Kevin Wu

Published: 21 Nov 2025, Last Modified: 14 Jan 2026GenAI in Finance PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Financial Question Answering, Hallucination Detection, Mechanistic Interpretability

TL;DR: Developed a hallucination detection method based on interpretable signals in financial question answering

Abstract: Retrieval-Augmented Generation (RAG) mitigates hallucinations by using external knowledge, yet models can still produce outputs inconsistent with retrieved evidence—a critical issue in financial QA. We find hallucinations often arise when later-layer feedforward networks (FFNs) over-inject parametric knowledge into the residual stream. To address this, we introduce external context scores and parametric knowledge scores, mechanistic features from Qwen3-0.6b across layers and attention heads. Using these signals, lightweight classifiers achieve strong detection performance and generalize to GPT-4.1-mini responses, demonstrating the promise of proxy-model evaluation for financial tasks. Mechanistic signals thus offer efficient, generalizable predictors for hallucination detection in RAG. Code and data: https://github.com/pegasi-ai/InterpDetect.}

Submission Number: 17

Loading