Ensemble Deep Learning Models for EEG-Based Auditory Attention Decoding

Peng Zhao, Ruicong Wang, Zijie Lin, Zexu Pan, Haizhou Li, Xueyi Zhang

Published: 01 Jan 2024, Last Modified: 07 Apr 2025ISCSLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Auditory attention decoding (AAD) is gaining traction in brain-computer interface (BCI) research, with a focus on using EEG signals to identify the attended speech. The First Chinese Auditory Attention Decoding Challenge presents two distinct scenarios: audio-only and audio-video conditions. Participating in the challenge, we introduce a novel multi-model en-semble approach designed to enhance the robustness and ac-curacy of AAD. Specifically, this approach integrates several advanced deep learning architectures, including CA-CNN, ResNet18, PyramidNet, and T-CNN, to effectively capture both temporal and spatial features from EEG data. We evaluate the performance of our ensemble model across two AAD single-subject and cross-subject-under both audio-only and audio-video conditions. Results demonstrate that the ensemble approach achieves relatively high AAD accuracy, with average accuracies of 97.99% and 97.81% for within-subject tasks under the respective conditions. We ranked third in the challenge.