MEDS-torch: An ML Pipeline for Inductive Experiments for EHR Medical Foundation Models

Published: 10 Oct 2024, Last Modified: 26 Nov 2024NeurIPS 2024 TSALM WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical, Machine-Learning, AI, EHR, Time-Series
TL;DR: A scalable pipeline for MEDS-formatted datasets, enabling the comparison of tokenization and transfer learning methods across EHR tasks and datasets to help practitioners identify optimal ML strategies for medical data.
Abstract: We introduce MEDS-Torch, a scalable and extensible pipeline for inductive experiments with sequence models on medical datasets adhering to the MEDS format—a universal schema for medical time series data. Using this pipeline, we systematically compare three tokenization methods (Everything In Code, Triplet, and Text Code) and evaluate five transfer learning techniques, including autoregressive generative modeling and contrastive learning variations, across multiple predictive tasks on the MIMIC-IV EHR dataset. Our empirical analysis provides actionable insights into the effectiveness of each method, demonstrating significant performance differences among tokenization and pretraining combinations. By benchmarking these approaches against fully supervised learning models, we offer practical recommendations for selecting appropriate modeling strategies in diverse healthcare settings. MEDS-Torch streamlines the process of running controlled experiments on medical datasets and promotes reproducibility and standardization in EHR research through its exclusive dependence on the MEDS schema, facilitating more effective machine learning experiments in healthcare without reliance on dataset-specific nuances.
Submission Number: 102
Loading