Anomalous Sound Detection Framework Based on Masking Strategy

Xiang Li, Caidan Zhao, Chenxing Gao, Wenxin Hu

Published: 01 Jan 2024, Last Modified: 13 May 2025ICIC (LNAI 6) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Unsupervised Anomalous Sound Detection (ASD) aims to identify abnormal sounds by learning the features of normal operational sounds and sensing their deviations. Existing deep learning-based methods insufficiently address the temporal dimension of data, particularly the subtle changes occurring in short periods. One-dimensional audio signals can be analyzed for their temporal and frequency domain relationships by converting them into Mel spectrograms. In light of the time-frequency characteristics of sound data, we have designed a novel Generative Adversarial Network method for anomalous sound detection based on a masking strategy (MS-GAN) to tackle this issue. We have re-envisioned the architecture of the generator by integrating a masking strategy. This approach involves masking operations on the temporal dimension of data, aiming to enhance the generator’s comprehension of the data’s intrinsic structure. To enable the model to capture normal feature patterns in time-series data at various granularities, we utilize a dual discriminator architecture comprising global and local discriminators. These discriminators provide feature-level guidance to the generator, compelling it to delve into the representation rather than focusing on surface-level noise. The performance of MS-GAN on the dataset of DCASE 2022 Challenge TASK 2 demonstrates the state-of-the-art result on three machine types.