Limitations of Open-Domain Question Answering Benchmarks for Document-level Reasoning

Ehsan Kamalloo, Charles L. A. Clarke, Davood Rafiei

Published: 2023, Last Modified: 09 Dec 2023SIGIR 2023Readers: Everyone

Abstract: Many recent QA models retrieve answers from passages, rather than whole documents, due to the limitations of deep learning models with limited context size. However, this approach ignores important document-level cues that can be crucial in answering questions. This paper reviews three open-domain QA benchmarks from a document-level perspective and finds that they are biased towards passage-level information. Out of 17,000 assessed questions, 82 were identified as requiring document-level reasoning and could not be answered by passage-based models. Document-level retrieval (BM25) outperformed both dense and sparse passage-level retrieval on these questions, highlighting the need for more evaluation of models' ability to understand documents, an often-overlooked challenge in open-domain QA.

0 Replies