Scalable Question Generation for Evaluating Longitudinal Reasoning in Electronic Health Records

Jordan Li Cahoon; Chloe O'Connell Stanwyck; Sulaiman Somani; Kevin Keet; Alison Callahan; Jason Alan Fries; Nigam Shah; Emily Alsentzer

Scalable Question Generation for Evaluating Longitudinal Reasoning in Electronic Health Records

, , , , , , ,

Published: 27 Nov 2025, Last Modified: 09 Dec 2025ML4H 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: question generation, electronic health records, natural language processing

TL;DR: We introduce a scalable framework that uses LLMs and clinician verification to generate clinically useful questions from patient records, enabling large-scale evaluation of clinical question-answer systems.

Track: Findings

Abstract: Evaluating question-answer systems for electronic health records is challenging due to the high cost of annotation, limiting the realism and scale of existing benchmarks. In this work, we introduce a scalable large language model-generated, clinician-verified framework to automatically generate questions that evaluate information retrieval over longitudinal records. This framework leverages patient timelines to generate questions that emulate questions asked during chart review. We compare generation approaches that leverage a single History \& Physical (H\&P) note versus supplementing the H\&P with patient facts. Physicians approved 93\% of questions generated from the H\&P with patient facts, a 7\% increase from using the H\&P alone. Incorporating facts into the generation process yielded a 4\% increase in verifiable questions and a 30\% increase in multi-hop questions, which are the most clinically useful questions that synthesize information across multiple encounters. Our findings demonstrate the utility of our framework to support meaningful evaluations of clinical question-answer system performance at scale.

General Area: Applications and Practice

Specific Subject Areas: Dataset Release & Characterization, Natural Language Processing

Data And Code Availability: Yes

Ethics Board Approval: Yes

Entered Conflicts: I confirm the above

Anonymity: I confirm the above

Submission Number: 238

Loading