Abstract: Anomalous Audio Sequence Detection (AASD) is a technique employed to identify atypical sound sequences, which are especially valuable in industrial monitoring applications, such as monitoring water pipes and various types of running machinery. The current deep model utilizes a combination of deep features to detect anomalies. However, the scarcity of anomalous samples and the abundance of redundant features contribute to unsatisfactory identification outcomes. In industrial monitoring scenarios, anomalies in audio sequences are typically identified by human experts who take various factors into consideration. Factors often play distinct roles in various scenarios. Therefore, we focus on tackling two key challenges: 1) What are the essential features for anomalous audio sequence detection? 2) How to adaptively fuse those critical features to detect the anomalous audio sequence for different tasks? In response to the above two questions, we design an end-to-end adaptive feature selection mechanism to identify important and non-redundant components for detecting anomalous audio sequences. Furthermore, based on the sifted critical features, we devise a dynamic aggregation model to adaptively extract distinguishable features for detecting anomalous audio sequences in multiple scenarios. The proposed dynamic aggregation model employs a simple network architecture with fewer parameters to extract features from pre-processed critical features with label supervision. This can effectively account for the limitations of traditional feature-based methods and deep learning-based methods. The experimental evaluations on three classic industry monitoring datasets demonstrate that the proposed method achieves SOTA performance and exhibits superior recall performance when compared with existing methods.
External IDs:dblp:conf/iconip/LiuGZLCBF24
Loading