Binary-Temporal Convolutional Neural Network for Multi-Class Auditory Spatial Attention Detection

Peng Zhao, Ruicong Wang, Xueyi Zhang, Mingrui Lao, Siqi Cai

Published: 01 Jan 2024, Last Modified: 07 Apr 2025ISCSLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Humans have a remarkable ability to focus on one of the sound sources in a multi-speaker environment. Auditory spatial attention detection (ASAD) aims to identify the direction of the speech source a person is attending to based on their brain signals, with potential applications in enhancing hearing aids, improving communication systems, and advancing braincomputer interface (BCI) technologies. Most prior studies formulated the problem as binary classification, however, realworld scenarios are much more complex. Our study explores the feasibility of detecting auditory attention among 10 competing speakers. To address the needs of low-resource computing equipment, we further propose a novel approach using a binary temporal convolutional neural network (B-TCNN) for multiclass ASAD tasks. This study effectively reduces memory consumption and accelerates inference. Experimental results show that the B-TCNN achieves an average accuracy of 93.8% with only 33K parameters in 1-second decision windows for a 10class ASAD dataset. The proposed network significantly outperforms other competitive models, offering a lightweight and efficient solution for multi-class ASAD tasks.