Dynamic TF-TDNN: Dynamic Time Delay Neural Network Based on Temporal-Frequency Attention for Dialect RecognitionDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 13 Nov 2023ICASSP 2023Readers: Everyone
Abstract: Dialect recognition aims to recognize dialect categories in utterances, which has been applied in many audio applications. Recently, various Time Delayed Neural Network (TDNN) based AI models are proposed to solve dialect recognition problems, such as D-TDNN, DMC-TDNN, and ECAPA-TDNN, however, most of them only perform temporal attention in the last statistical pooling layer of the TDNN network, which ignores the importance of simultaneously capturing both frequency and temporal key information in utterances under different receptive fields. In contrast, we introduce a hybrid attention mechanism in both the temporal and frequency domain, called the TF-attention module, which adaptively pays more attention to the indeed important frames and the frame-level important information under different receptive fields for dialect recognition. Moreover, we are the first to introduce a dynamic architecture mechanism in the field of dialect recognition to dynamically reduce the computational cost and the number of parameters of models. We evaluate the proposed dynamic TF-TDNN on the OLR challenge AP20-OLR-dialect task and achieve State-Of-The-Art (SOTA) performance with fewer model parameters.
0 Replies

Loading