Keywords: World model, SSL, JEPA, ECGs
TL;DR: World model pretrained on 700K ECG pairs synthesises 2M rare-condition embeddings in representation space. MLP probe reaches AUROC 0.743 on 76 ICD conditions, recovering 55% of the gap to full fine-tuning.
Abstract: Automated multi-label ECG classification struggles with severe class imbalance:
rare co-occurring cardiac conditions are systematically underrepresented in
clinical datasets. We address this with a world-model approach, pretraining a
Joint-Embedding Predictive Architecture (LeJEPA) on over 700K longitudinal ECG
pairs from MIMIC-IV-ECG, training it to predict how a patient's latent ECG
representation changes between visits given the shift in their ICD label set.
After pretraining, we repurpose the frozen dynamics model as a data augmentor
-- given a normal embedding and a target condition combination, it synthesises
the corresponding abnormal embedding entirely in representation space, without
generating a single additional waveform. Training a lightweight MLP probe on
the resulting 2.7M-embedding dataset (721K real + 2M synthetic) achieves a
macro-averaged AUROC of 0.743 across 76 ICD-coded conditions, recovering 55\%
of the gap between a real-data-only linear probe (0.687) and a fully fine-tuned
encoder (0.789), with no encoder updates.
Submission Category: Extended Abstract
Overaged Verification: Yes
Latin American Hispanic Heritage: Yes
Icml Proceedings Status: No
Submission Number: 22
Loading