Exploring Large Scale Pre-Trained Models for Robust Machine Anomalous Sound Detection

Published: 01 Jan 2024, Last Modified: 24 Sept 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Machine anomalous sound detection is a useful technique for various applications, but it often suffers from poor generalization due to the challenges of data collection and complex acoustic environment. To address this issue, we propose a robust machine anomalous sound detection model that leverages self-supervised pre-trained models on large-scale speech data. Specifically, we assign different weights to the features from different layers of the pre-trained model and then use the working condition as the label for self-supervised classification fine-tuning. Moreover, we introduce a data augmentation method that simulates different operating states of the machine to enrich the dataset. Furthermore, we devise a transformer pooling method that fuses the features of different segments. Experiments on the DCASE2023 dataset show that our proposed method outperforms the commonly used reconstruction-based autoencoder and classification-based convolutional network by a large margin, demonstrating the effectiveness of large-scale pre-training for enhancing the generalization and robustness of machine anomalous sound detection. In Task2 of DCASE2023, we achieve 2nd place with these methods.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview