Keywords: ECG, SigLIP, Multi-class
Abstract: Recent advances in large language models (LLMs) have enabled the development
of multimodal medical AI. While models such as MedGemini achieve high ac-
curacy on VQA tasks like USMLE-MM, their performance on ECG-based tasks
remains limited, and some models, such as MedGemma, do not support ECG data
at all. Interpreting ECGs is inherently challenging, and diagnostic accuracy can
vary depending on the interpreter’s experience. Although echocardiography pro-
vides rich diagnostic information, it requires specialized equipment and personnel,
limiting its availability.
In this study, we focus on constructing a robust ECG encoder for multimodal
pretraining using real-world hospital data. We employ SigLIP, a CLIP-based model
with a sigmoid-based loss function enabling multi-class prediction, and introduce a
modified loss function tailored to the multi-class nature of ECG data. Experiments
demonstrate that incorporating medical knowledge in the language model and
applying the modified loss significantly improve multi-class ECG classification.
To further enhance performance, we increase the embedding dimensionality and
apply random cropping to mitigate data drift.
Finally, per-label analysis reveals which ECG findings are easier or harder to
predict. Our study provides a foundational framework for developing medical
models that utilize ECG data.
Submission Number: 36
Loading