Abstract: Detectives frequently engage in information detection and reasoning simultaneously when making decisions across various cases, especially when confronted with a vast amount of information. With the rapid development of large language models~(LLMs), evaluating how these models identify key information and reason to solve questions becomes increasingly relevant. We introduces the DetectBench, a reading comprehension dataset designed to assess a model's ability to jointly ability in key information detection and multi-hop reasoning when facing complex and implicit information. The DetectBench comprises 3,928 questions, each paired with a paragraph averaging 190 tokens in length. To enhance model's detective skills, we propose the Self-Question Framework. These methods encourage models to identify all possible clues within the context before reasoning. Our experiments reveal that existing models perform poorly in both information detection and multi-hop reasoning. However, the Self-Question Framework approach alleviates this issue.
Paper Type: long
Research Area: Question Answering
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: english, chinese
0 Replies
Loading