More Bang for your Context: Virtual Documents for Question Answering over Long Documents

ACL ARR 2024 June Submission1252 Authors

14 Jun 2024 (modified: 04 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We deal with the problem of Question Answering (QA) over long documents, which poses a challenge for modern Large Language Models (LLMs). Although LLMs can handle increasingly longer context windows, they struggle to effectively utilize the long content. To address this issue, we introduce the concept of a virtual document (VDoc). A VDoc is created by selecting chunks from the original document that are most likely to contain the information needed to answer the user’s question, while ensuring they fit within the LLM’s context window. We hypothesize that providing a short and focused VDoc to the LLM is more effective than filling the entire context window with less relevant information. Our experiments confirm this hypothesis and demonstrate that using VDocs improves results on the QA task.
Paper Type: Short
Research Area: Information Retrieval and Text Mining
Research Area Keywords: passage retrieval; dense retrieval; document representation; re-ranking; grounded dialog
Contribution Types: Approaches to low-resource settings, Data analysis
Languages Studied: english
Submission Number: 1252
Loading