Joint time–frequency scattering-enhanced representation for bird vocalization classification

Yimeng Min; Eliot T Miller; Daniel Fink; Carla P Gomes

Joint time–frequency scattering-enhanced representation for bird vocalization classification

Yimeng Min, Eliot T Miller, Daniel Fink, Carla P Gomes

Published: 21 Oct 2023, Last Modified: 15 Dec 2023NeurIPS CompSust 2023 PosterEveryoneRevisionsBibTeX

Keywords: passive acoustic monitoring, Mel Spectrogram, joint time-frequency scattering, bird vocalization

TL;DR: Joint time-frequency scattering representation outperforms Mel Spectrogram in bird vocalization classification tasks

Abstract: Neural Networks (NNs) have been widely used in passive acoustic monitoring. Typically, audio is converted into a Mel Spectrogram as a preprocessing step before being fed into NNs. In this study, we investigate the Joint Time-Frequency Scattering transform as an alternative preprocessing technique for analyzing bird vocalizations. We highlight its superiority over the Mel Spectrogram because it captures intricate time-frequency patterns and emphasizes rapid signal transitions. While the Mel Spectrogram often gives similar importance to all sounds, the scattering transform differentiates between rapid and slow variations better. We use a Convolution Neural Network architecture and an attention-based transformer. Our results demonstrate that both the NN architectures can benefit from this enhanced preprocessing, where scattering transform can provide a more discriminative representation of bird vocalizations than the traditional Mel Spectrogram.

Submission Number: 22

Loading