Automated Data Augmentation for Audio Classification

Yanjie Sun, Kele Xu, Chaorun Liu, Yong Dou, Huaimin Wang, Bo Ding, Qinghua Pan

Published: 2024, Last Modified: 14 May 2025IEEE ACM Trans. Audio Speech Lang. Process. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Audio classification is a challenging task that requires categorizing audio data based on its content or characteristics. Existing approaches for audio classification rely either on supervised learning or fine-tuning based on self-supervised learning, both of which require manually labeled data. However, manually labeling audio datasets is a time-consuming and expensive process that limits the dataset's size. Moreover, the diversity of sound categories and class imbalances can further impede classification performance. To overcome these challenges, researchers have proposed various audio data augmentation methods. However, most of these methods focus less on augmentations combination and design and rely solely on waveform-based or spectrogram-based approaches. This paper presents an Automated Audio Augmentation (AAA) method for audio classification, which generates learnable and composable augmentation policies suitable for the audio classification task and can be employed in a plug-and-play manner. This method leverages both waveform-level and spectrogram-level augmentation, and a Bayesian optimization algorithm is proposed to search for composed augmentation policies. To the best of our knowledge, this is the first attempt to propose an automatic data augmentation method for audio classification tasks. Through large-scale empirical studies, we demonstrate that the proposed method outperforms previous competitive methods by a significant margin. We improve the average performance of multiple datasets by 6.421% and by 7.330% on few-shot scenarios, respectively.