Grounding Large Language Model with Causal Knowledge Retrieval

ACL ARR 2025 February Submission874 Authors

11 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) often require grounding on external knowledge to generate accurate and faithful outputs. However, this process can easily fail with inaccurate semantic similarity searches: it tends to retrieve information that only appears similar to the query without actually aiding in the response, thus acting as noise or even misguiding the generation. Addressing this issue, we propose the Causal Inference Score (CIS), which measures how likely a knowledge candidate will help answer the user’s question by computing the debiased textual entailment confidence between the question and the candidate using an LLM. For cost-efficient inference, we further propose a knowledge distillation method to transfer CIS estimation to a lightweight BERT model. Extensive experiments show that simply altering the similarity measure to CIS can lead to significant improvements, increasing answer accuracy by up to 20.5\% and F1 by 23.3\%, outperforming recent works that involve complex multistage pipelines.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: knowledge base QA,multilingual QA,reasoning
Contribution Types: Surveys, Theory
Languages Studied: English
Submission Number: 874
Loading