Frequency-Enhanced Hybrid Multimodal CNN-Transformer Network for Electrocardiogram Classification

29 Jun 2024 (modified: 21 Aug 2024)IEEE ICIST 2024 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recently, deep learning-based models have been widely adopted for electrocardiogram (ECG) classification tasks, demonstrating greater accuracy and efficiency than manual diagnosis. Most existing methods use raw ECG or its time-frequency domain representation as input. These methods are constrained by their reliance on a single input modality, thereby limiting the network's ability to capture discriminative information effectively. In our study, we treat the frequency spectrum of ECG as an independent modality and input it into a multimodal classification model along with ECG. Our method combines depthwise separable convolution and Transformer architectures for unimodal feature extraction. A linear layer aligns the features from both modalities, and a Transformer layer facilitates multimodal feature fusion. We evaluate the performance of our model in both the multi-label classification task using the Ningbo dataset and the multi-class classification task using the Real World dataset. Our model demonstrates superior classification performance compared to the competitive baseline models.
Submission Number: 4
Loading