Tabular Population Priors as Structured Retrieval for EHR Medication Set Prediction

Animesh Agarwal; Meysam Ghaffari; Nina Fatehi; Carlos Morato

Tabular Population Priors as Structured Retrieval for EHR Medication Set Prediction

Animesh Agarwal, Meysam Ghaffari, Nina Fatehi, Carlos Morato

Published: 23 May 2026, Last Modified: 23 May 2026SD4H ICML 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: structured data, electronic health records, medication recommendation, clinical retrieval augmented generation, population priors

Abstract: Medication prediction from electronic health records is a fixed-vocabulary structured prediction problem, yet recent clinical RAG systems often treat it like open-ended case-based reasoning: retrieve similar patient admissions and ask an LLM to infer a medication set. This retrieval object is indirect---the model receives retrieved patient records while the target is a discrete set of medication-class labels. We propose a more task-aligned retrieval object: a tabular population prior---ranked medication-class candidates from a sparse condition--medication co-occurrence table estimated from training admissions. Using PACE-RAG as a fixed downstream pipeline and replacing only its patient retriever, this drop-in swap improves F1 from 0.303 to 0.338 (+11.5\% relative) and Jaccard from 0.218 to 0.237 on 1,004 held-out MIMIC-IV admissions, with no additional LLM calls and no inference-time patient index. A non-LLM top-k baseline reaches F1=0.248, confirming the prior is predictive before LLM integration. Aggregate population statistics can provide a more task-aligned, efficient, and auditable retrieval source than patient-level nearest neighbors for fixed-vocabulary EHR set prediction.

Submission Number: 149

Loading