Semi-supervised audio tagging with deep co-training and augmentations

Leo Cances; Thomas Pellegrini

Semi-supervised audio tagging with deep co-training and augmentations

Leo Cances, Thomas Pellegrini

09 Jun 2020 (modified: 05 May 2023)Submitted to SAS 2020Readers: Everyone

Keywords: Deep Co-Training, semi supervised learning, audio tagging

TL;DR: Adaptation of Deep Co-Training for sound event classification combine with data augmentations

Abstract: In this work, we explore the task of audio tagging in a semi-supervised context. The recently proposed Deep Co-Training (DCT) algorithm has shown impressive results in visual object recognition and outperformed other semi-supervised state-of-the-art methods such as Mean Teacher and GANs. DCT uses two or more deep neural networks and adversarial examples to enforce complementarity between the models trained on the same data. We adapted DCT to audio tagging, and we report experiments on the publicly available UrbanSound8K dataset. We compare models trained with 10% of labeled data using supervised training and using DCT, which may benefit from the remaining 90% unlabeled data. To go further than the original DCT proposal, we propose to artificially increase the 10% of labeled files by simply duplicating them in the mini-batches during learning, and transforming them with audio data augmentations. If standard DCT already showed performance gains against supervised learning (17% relative gain), the use of duplication combined with data augmentations on the labeled examples lead to additional significant performance improvements (26% gain).

Double Submission: No

4 Replies

Loading