Abstract: Language models often generate non-factual statements, especially when handling complex queries that require synthesizing information from multiple sub-queries. Verifying the factuality in such cases poses significant challenges and often demands the use of large language models, which can be computationally expensive. In this work, we focus on one such scenario: addressing non-factual statements within a multi-hop question-answering setup using a smaller model. We propose a novel approach (Self-Resolve) inspired by the self-discovery and self-check prompting techniques, enabling language models to construct their own reasoning structures for fact verification and then resolve the final answer based on a majority voting mechanism. This integrated framework outperforms closed-source models like GPT-4 by 9\% in F1 score for 2-hop query-answer verification using Llama3-8B while achieving competitive results in 3-hop and 4-hop settings. These results underscore the effectiveness of our approach and provide valuable insights into the challenges and potential of fact-checking in language models.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Fact Verification, Hallucination, Multi-hop Reasoning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency, Data resources
Languages Studied: English
Submission Number: 2194
Loading