Knowledge-enhanced Multimodal ECG Representation Learning with Arbitrary-Lead Inputs

ICLR 2025 Conference Submission481 Authors

13 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Electrocardiogram, healthcare, physiological signals
TL;DR: We propose K-MERL, the first framework to integrate structured knowledge from free-text reports for ECG multimodal learning. It achieves superior performance and supports arbitrary lead inputs, surpassing the limitations of fixed 12-lead setups.
Abstract: Recent advancements in multimodal representation learning for electrocardiogram (ECG) have moved onto learning representations by aligning ECG signals with their paired free-text reports. However, current methods often result in suboptimal alignment of ECG signals with their corresponding text reports, thereby limiting diagnostic accuracy. This is primarily due to the complexity and unstructured nature of medical language, which makes it challenging to effectively align ECG signals with the corresponding text reports. Additionally, these methods are unable to handle arbitrary combinations of ECG leads as inputs, which poses a challenge since 12-lead ECGs may not always be available in under-resourced clinical environments. In this work, we propose the **K**nowledge-enhanced **M**ultimodal **E**CG **R**epresentation **L**earning (**K-MERL**) framework to address these challenges. K-MERL leverages large language models (LLMs) to extract structured knowledge from free-text reports, enhancing the effectiveness of ECG multimodal learning. Furthermore, we design a lead-aware ECG encoder to capture lead-specific spatial-temporal characteristics of 12-lead ECGs, with dynamic lead masking. This novel encoder allows our framework to handle arbitrary lead inputs, rather than being limited to a fixed set of full 12 leads, which existing methods necessitate. We evaluate K-MERL on six external ECG datasets and demonstrate its superior capability. K-MERL not only outperforms all existing methods in zero-shot classification and linear probing tasks using 12 leads, but also achieves state-of-the-art (SOTA) results in partial-lead settings, with an average improvement of **16%** in AUC score on zero-shot classification compared to previous SOTA multimodal methods[^1]. [^1]: All data and code will be released upon acceptance.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 481
Loading