- Keywords: Deep Co-Training, semi supervised learning, audio tagging
- TL;DR: Adaptation of Deep Co-Training for sound event classification combine with data augmentations
- Abstract: In this work, we explore the task of audio tagging in a semi-supervised context. The recently proposed Deep Co-Training (DCT) algorithm has shown impressive results in visual object recognition and outperformed other semi-supervised state-of-the-art methods such as Mean Teacher and GANs. DCT uses two or more deep neural networks and adversarial examples to enforce complementarity between the models trained on the same data. We adapted DCT to audio tagging, and we report experiments on the publicly available UrbanSound8K dataset. We compare models trained with 10% of labeled data using supervised training and using DCT, which may benefit from the remaining 90% unlabeled data. To go further than the original DCT proposal, we propose to artificially increase the 10% of labeled files by simply duplicating them in the mini-batches during learning, and transforming them with audio data augmentations. If standard DCT already showed performance gains against supervised learning (17% relative gain), the use of duplication combined with data augmentations on the labeled examples lead to additional significant performance improvements (26% gain).
- Double Submission: No