Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Inference-time Intervention, Representation Analysis, Linear Probe
Abstract: Large Language Models (LLMs) often fail on multiple-choice questions (MCQs) despite demonstrating correct knowledge in other contexts, such as free-form generation. To investigate the mechanism underlying this knowledge-prediction gap and alleviate it, we conduct a probing analysis on binary-choice questions and find that residual streams in certain layers contain a subspace spanned by two important bases: a knowledge basis that encodes the probability of the ground-truth answer and a prediction basis that encodes the probability of the answer choice predicted by the model. We observe that incorrect predictions arise from a misalignment of the model's hidden states along these two bases. Hence, we introduce KAPPA (Knowledge-Aligned Prediction through Projection-based Adjustment), an inference-time intervention that transforms hidden states to align the prediction coordinate with the knowledge coordinate. Experiments on binary-choice reformulations of Big-Bench-Hard show that KAPPA substantially improves accuracy and consistently outperforms baselines. KAPPA's benefit further extends to general MCQs, precisely mitigating the knowledge-predictino gap. Our work provides a new geometric understanding of the knowledge-prediction gap and offers a practical method for better aligning model behavior with its latent knowledge.
Primary Area: interpretability and explainable AI
Submission Number: 6660
Loading