Count-Based Approaches Remain Strong: A Benchmark Against Transformer and LLM Pipelines on Structured EHR

Jifan Gao; Brian Wolpin; Simona Cristea

Count-Based Approaches Remain Strong: A Benchmark Against Transformer and LLM Pipelines on Structured EHR

Jifan Gao, Brian Wolpin, Simona Cristea

Published: 12 Oct 2025, Last Modified: 13 Oct 2025GenAI4Health 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Structured Electronic Health Records, Clinical Risk Prediction, LLMs, mixture-of-agents

TL;DR: Despite the LLM era, simple count-based models remain strong for structured EHR prediction, with a mixture-of-agents pipeline achieving superior performance under specific clinical scenarios.

Abstract: Structured electronic health records (EHR) are essential for clinical prediction. While count-based learners continue to perform strongly on such data, no benchmarking has directly compared them against more recent mixture-of-agents LLM pipelines, which have been reported to outperform single LLMs in various NLP tasks. In this study, we evaluated three method categories for EHR prediction using the EHRSHOT dataset: count-based models built from ontology roll-ups with two time bins, based on LightGBM and the tabular foundation model TabPFN; a pretrained sequential transformer (CLMBR); and a mixture-of-agents pipeline that converts tabular histories to natural-language summaries followed by a text classifier. We assessed four outcomes: long length-of-stay, readmission, pancreatic cancer diagnosis, and acute myocardial infarction diagnosis. Count-based approaches lead most comparisons in AUROC and AUPR, with the exception that the mixture-of-agents pipeline achieves the best AUPR in acute myocardial infarction prediction. Overall, our benchmarking study reaffirms the superiority of count-based modeling for structured EHR, while also highlighting how mixture-of-agents pipelines can outperform under specific clinical scenarios.

Submission Number: 131

Loading