How Well Do Multi-hop Reading Comprehension Models Understand Date Information?Download PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Many previous works demonstrated that existing multi-hop reading comprehension datasets (e.g., HotpotQA) contain reasoning shortcuts, where the questions can be answered without performing multi-hop reasoning. Recently, several multi-hop datasets have been proposed to solve the reasoning shortcut problem or evaluate the internal reasoning process. However, the design of the reasoning chain for comparison questions in R4C and 2WikiMultiHopQA does not fully explain the answer; meanwhile, MuSiQue only focuses on bridge questions. Therefore, it is unclear about the ability of a model to perform step-by-step reasoning when finding an answer for a comparison question that requires comparison and numerical reasoning skills. To evaluate the model completely in a hierarchical manner, we first propose a dataset, HieraDate, created by reusing and enhancing two previous multi-hop datasets, HotpotQA and 2WikiMultiHopQA. Our dataset focuses on comparison questions on date information that require multi-hop reasoning for solving. We then evaluate the ability of existing models to understand date at three levels: extraction, reasoning, and robustness. Our experimental results reveal that the multi-hop models fail at the reasoning level. Comparison reasoning and numerical reasoning (e.g., subtraction) are key challenges that need to be addressed in future works.
0 Replies

Loading