World Model Augmentation for Imbalanced Multi-Label ECG Classification

Published: 10 Jun 2026, Last Modified: 10 Jun 2026LXAI @ ICML 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: World model, SSL, JEPA, ECGs
TL;DR: World model pretrained on 700K ECG pairs synthesises 2M rare-condition embeddings in representation space. MLP probe reaches AUROC 0.743 on 76 ICD conditions, recovering 55% of the gap to full fine-tuning.
Abstract: Automated multi-label ECG classification struggles with severe class imbalance: rare co-occurring cardiac conditions are systematically underrepresented in clinical datasets. We address this with a world-model approach, pretraining a Joint-Embedding Predictive Architecture (LeJEPA) on over 700K longitudinal ECG pairs from MIMIC-IV-ECG, training it to predict how a patient's latent ECG representation changes between visits given the shift in their ICD label set. After pretraining, we repurpose the frozen dynamics model as a data augmentor -- given a normal embedding and a target condition combination, it synthesises the corresponding abnormal embedding entirely in representation space, without generating a single additional waveform. Training a lightweight MLP probe on the resulting 2.7M-embedding dataset (721K real + 2M synthetic) achieves a macro-averaged AUROC of 0.743 across 76 ICD-coded conditions, recovering 55\% of the gap between a real-data-only linear probe (0.687) and a fully fine-tuned encoder (0.789), with no encoder updates.
Submission Category: Extended Abstract
Overaged Verification: Yes
Latin American Hispanic Heritage: Yes
Icml Proceedings Status: No
Submission Number: 22
Loading