Abstract: In Question Answering (QA), Retrieval Augmented Generation (RAG) has revolutionized performance in various domains. However, how to effectively capture multi-document relationships remains an open question. This is particularly critical for biomedical tasks due to their reliance on information spread across multiple documents. In this work, we propose a novel method CLAIMS, which utilizes propositional claims to construct a local knowledge graph from retrieved documents. Summaries are then derived via layerwise summarization from the knowledge graph to contextualize a small language model to perform QA. The structured summaries effectively capture explicit and implicit relationships between entities in the documents, thus having a more comprehensive context to provide to LLMs. CLAIMS achieved comparable or superior performance over RAG baselines on several biomedical QA benchmarks. We also evaluated its generalizability and each individual step of our approach with a targeted set of metrics, demonstrating its effectiveness.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: Information Extraction, Information Retrieval and Text Mining, Question Answering, Summarization
Contribution Types: NLP engineering experiment
Languages Studied: English
Previous URL: https://openreview.net/forum?id=PMFaQU3zOx
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).
Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Section 7
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 4.1, 4.2, 4.3, 4.4, Appendix A.1, Appendix C, Appendix D, Appendix E
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: Appendix A.1, Appendix C, Appendix E
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Appendix A.1, Appendix C, Appendix E
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: We only use data from publicly available datasets for either LLM benchmarking, general medical literature, and wikipedia content.
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Appendix A.1
B6 Statistics For Data: Yes
B6 Elaboration: Appendix E
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Appendix C
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 4.2, Appendix A.1, Appendix C
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 4.2, Section 4.6
C4 Parameters For Packages: Yes
C4 Elaboration: Appendix C
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Appendix D
Author Submission Checklist: yes
Submission Number: 892
Loading