Investigating LLM Capabilities on Long Context Comprehension for Medical Question Answering

Investigating LLM Capabilities on Long Context Comprehension for Medical Question Answering

ACL ARR 2026 January Submission10728 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Question Answering, Language Modeling, NLP Applications

Abstract: This study is the first to investigate LLM comprehension capabilities over long-context (LC), clinically relevant medical Question Answering (QA) beyond MCQA. Our comprehensive approach considers a range of settings based on content inclusion of varying size and relevance, LLM models of different capabilities and a variety of datasets across task formulations. We reveal insights on model size effects and their limitations, underlying memorization issues and the benefits of reasoning models, while demonstrating the value and challenges of leveraging the full long patient's context. Importantly, we examine the effect of Retrieval Augmented Generation (RAG) on medical LC comprehension, uncovering best settings in single versus multi-document QA datasets. We shed light into some of the evaluation aspects using a multi-faceted approach uncovering common metric challenges. Our quantitative analysis reveals challenging cases where RAG excels while still showing limitations in cases requiring temporal reasoning.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: Question Answering, Language Modeling, NLP Applications

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 10728

Loading