Benchmarking Deep Learning Architectures for ECG-Based Multi-label Heart Disease Prediction using MIMIC-IV Database

Eyiara Oladipo, Sarwar Nazrul, Mohamed S Nafea

Published: 04 Jul 2025, Last Modified: 11 Sept 20252025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS)EveryoneCC BY-NC-ND 4.0

Abstract: Cardiovascular disease (CVD) is a leading cause of global mortality, accounting for an estimated 17.9 million deaths annually. CVD is broadly defined as a group of medical conditions influenced by modifiable or non-modifiable risk factors that affect the heart’s ability to function properly. Machine learning (ML) has emerged as a powerful tool for analyzing complex medical data, aiding in early detection and accurate diagnosis of CVD and improving patient outcomes. Recent studies proposed various deep learning (DL) architectures for detecting CVD, yet there is a lack of robust benchmarks for comparing their performance on large-scale databases. In this work, we benchmark six state-of-the-art DL architectures for multi-label heart disease classification using 12-lead electrocardiogram (ECG) data from the large-scale publicly available Medical Information Mart for Intensive Care (MIMIC) database. Specifically, we evaluate a 1-dimensional convolutional neural network (CNN) with residual blocks (1D-CNN-ResNet); bidirectional long-short-term-memory neural network with convolutional layers (CNN-Bi-LSTM); spectrogram-based CNN (SpG-CNN); convolution-attention-transformer network (CAT-Net); hierarchical attention network (HAN), and structured state space sequence (S4) model; on a multi-label heart disease classification task with seven diagnostic targets. Model accuracy is assessed using the Hamming distance and its complexity is measured by number of model parameters. By contrasting models’ accuracies versus their complexity, we establish a reliable benchmark providing constructive insights for advancing automated cardiovascular diagnostics.