CELL: Supplementing Context for Multi-modal Few-shot Cardiologist

Huaicheng Zhang, Jiguang Shi, Yue Ge, Sheng Chang, Hao Wang, Qijun Huang

Published: 01 Jan 2024, Last Modified: 26 May 2025BIBM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Language-based multi-modal learning has showcased remarkable efficacy across various domains. However, certain areas such as electrocardiogram (ECG) analysis face challenges due to incomplete and scarce textual data. Analyzing incomplete corpora poses difficulties for pre-trained language models, while data scarcity makes it difficult tre alignment and classification respectively, classification loss combined with consistent contexts are added in pre-training to alleviate gaps between these targets. Meanwhile, consistent contexts narrow gaps between prompts, allowing models to focus on genuinely informative features. Consequently, ECG feature spaces align more closely with semantic spaces, ensuring robust classification performance and enhancing the quality of multi-modal representations. Through extensive experiments involving zero-shot and few-shot learning, CELL demonstrates superior performances in near-distribution, out-of-distribution, and clinical domains. Its robust and generalized performance positions CELL as a promiso harmonize multimodal pretraining and downstream goals. To address these problems, this study introduces a novel ECG-Language multi-modal learning method named Context ECG-Language Learning (CELL). To supplement context and complete corpora, dynamic contexts composed of learnable vectors are incorporated into language model embedding, with other parameters fixed. Since pre-training and downstream targets are featuing approach for multi-modal learning in fields with scarce and incomplete corpora.