Fast Chirplet Transform Injects Priors in Deep Learning of Animal Calls and Speech

Hervé Glotin, Julien Ricard, Randall Balestriero

Feb 17, 2017 (modified: Feb 17, 2017) ICLR 2017 workshop submission readers: everyone
  • Abstract: Bioacoustic data set analyses require substantial baseline training data in order to accurately recognize and characterize specific kernels. Current approaches using the scattering framework and/or Convolutional Neural Nets (CNN) often require substantial dedicated computer time to achieve desired results. We propose a trade-off between these two approaches using a Chirplet kernel as an efficient Q constant bioacoustic representation to pretrain the CNN. First we implement a Chirplet bioinspired auditory representation. Second we implement the first algorithm (and code) for a Fast Chirplet Transform (FCT). Third, we demonstrate the computation efficiency of the FCT on selected large environmental databases: including months of Orca recordings and 1000 Birds species from the LifeClef challenge. Fourth, we validate the FCT on the vowels subset of the Speech TIMIT dataset. The results show that FCT accelerates CNN by twenty eight percent for birds classification, and by twenty six percent for vowel classification. Scores are also enhanced by FCT pretraining, with a relative gain of 7.8\% of Mean Average Precision on birds, and 2.3\% of vowel accuracy against raw audio CNN. We conclude on with perspectives on tonotopic FCT deep machine listening, and inter-species bioacoustic transfer learning to generalise the representation of animal communication systems.
  • TL;DR: Invited workshop paper : We propose the chirplet kernel as an efficient Q constant bioacoustic representation to pretrain the CNN
  • Keywords: Deep learning, Supervised Learning, Applications
  • Conflicts: Rice univ, Toulon univ.