Independent Feature Enhanced Crossmodal Fusion for Match-Mismatch Classification of Speech Stimulus and EEG Response

Shitong Fan, Wenbo Wang, Feiyang Xiao, Shiheng Zhang, Qiaoxi Zhu, Jian Guan

Published: 01 Jan 2024, Last Modified: 15 May 2025ISCSLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: It is crucial for auditory attention decoding to classify matched and mismatched speech stimuli with corresponding EEG re-sponses by exploring their relationship. However, existing methods often adopt two independent networks to encode speech stimulus and EEG response, which neglect the relationship between these signals from the two modalities. In this paper, we propose an independent feature enhanced cross-modal fusion model (IFE-CF) for match-mismatch classification, which leverages the fusion feature of the speech stimu-lus and the EEG response to achieve auditory EEG decoding. Specifically, our IFE-CF contains a crossmodal encoder to encode the speech stimulus and the EEG response with a two-branch structure connected via crossmodal attention mechanism in the encoding process, a multi-channel fusion module to fuse features of two modalities by aggregating the interaction feature obtained from the crossmodal encoder and the independent feature obtained from the speech stimulus and EEG response, and a predictor to give the matching result. In addition, the causal mask is introduced to consider the time delay of the speech-EEG pair in the crossmodal encoder, which further enhances the feature representation for match-mismatch classification. Ex-periments demonstrate our method's effectiveness with better classification accuracy, as compared with the baseline of the Auditory EEG Decoding Challenge 2023.