MIMIC-SR-ICD11: A Narrative-Driven Benchmark for Disease Prediction with Large Language Models

MIMIC-SR-ICD11: A Narrative-Driven Benchmark for Disease Prediction with Large Language Models

ACL ARR 2025 May Submission4759 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Early disease diagnoses can dramatically improve patient outcomes by enabling timely interventions, yet traditional approaches rely on laboratory and imaging data that require clinical visits and incur significant costs and delays. In this study, we introduce MIMIC-SR-ICD11 (MIMIC Self-Report with ICD-11), a dataset that transforms EHR discharge notes from the MIMIC database into first-person patient narratives and standardizes every diagnoses using WHO ICD-11 codes. We benchmark three leading large language models on overall accuracy (Hit @1 and F1 variants), sensitivity to candidate list length and ordering, and robustness across diseases of varying prevalence. Our experiments show that simply shortening the candidate list does not yield proportional gains in accuracy, and F1 scores even fall below a random-guess baseline. By splitting diseases into ten frequency-based groups, we uncover an unexpected accuracy dip for the most common conditions. To explain this phenomenon, we introduce two lexical specificity metrics: disease frequency–medical vocabulary size (DF-MVS) and medical term exclusivity score (MTES). These metrics demonstrate that generic, non-distinctive terminology drives prediction bias. To support future advances, we release our dataset as a standardized benchmark for the development of specialized medical diagnostic models.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Human-Centered NLP; Model Bias and Fairness; Data-Efficient Training; Interpretability and Analysis

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Submission Number: 4759

Loading