Teach Multimodal LLMs to Comprehend Electrocardiographic Images

ACL ARR 2025 February Submission3947 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Electrocardiograms (ECGs) are essential, non-invasive diagnostic tools for assessing cardiac conditions. Existing methods suffer from limited generalizability, focusing on a narrow range of conditions, and typically depend on raw physiological signals, which may not be available in resource-limited settings where only printed or digital ECG images are accessible. Recent advancements in multimodal large language models (MLLMs) present promising opportunities for addressing these challenges. However, the application of MLLMs to ECG image interpretation remains challenging due to the lack of instruction-tuning data and well-established ECG image benchmarks for quantitative evaluation. To address these challenges, we introduce ECGInstruct, the first ECG image instruction-tuning dataset with over one million samples, covering a wide range of ECG-related tasks from diverse data sources. We develop PULSE, a fully open-source MLLM for ECG image interpretation trained on ECGInstruct. We curate ECGBench, a human expert-curated benchmark covering four key ECG image interpretation tasks across nine different datasets. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 21% to 33%. This work highlights the potential of PULSE to enhance ECG interpretation in clinical practice.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Electrocardiogram, ECG, LLMs, Multimodal LLMs, Instruction Tuning, Benchmark and Evaluation
Contribution Types: NLP engineering experiment, Reproduction study, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: english
Submission Number: 3947
Loading