MedMod: Multimodal Benchmark for Medical Prediction Tasks with Electronic Health Records and Chest X-Ray Scans

Shaza Elsharief, Saeed Shurrab, Baraa Al Jorf, Leopoldo Julian Lechuga Lopez, Krzysztof J. Geras, Farah E. Shamout

Published: 04 May 2025, Last Modified: 05 Sept 2025Conference on Health, Inference, and Learning (CHIL) 2025EveryoneCC BY 4.0

Abstract: Multimodal machine learning provides a myriad of opportunities for developing models that integrate multiple modalities and mimic decisionmaking in the real-world, such as in medical settings. However, benchmarks involving multimodal medical data are scarce, especially routinely collected modalities such as Electronic Health Records (EHR) and Chest X-ray images (CXR). To contribute towards advancing multimodal learning in tackling real-world prediction tasks, we present MedMod, a multimodal medical benchmark with EHR and CXR using publicly available datasets MIMIC-IV and MIMICCXR, respectively. MedMod comprises five clinical prediction tasks: clinical conditions, inhospital mortality, decompensation, length of stay, and radiological findings. We extensively evaluate multimodal supervised learning models and self-supervised learning frameworks, making our code and models open-source.