Abstract: Deep learning models have shown remarkable success across various domains, but their effectiveness often depends on access to extensive labeled datasets. Acquiring large amounts of labeled data, however, can be challenging. Contrastive learning, using contrastive loss, has emerged as a promising approach for learning useful representations. In this work, we address a scenario where we have access to both labeled and unlabeled data from the same domain. While labeled data may be limited, unlabeled data is often more abundant. To tackle this challenge, we propose an end-to-end audio classification model in a semi-supervised learning setting, integrating contrastive loss, supervised contrastive loss, and consistency regularization. Our approach effectively combines information from both labeled and unlabeled data, enhancing classification performance even with limited labeled data. Experimental results demonstrate the efficacy of our model, which outperforms baseline models and the FixMatch method, highlighting the potential of leveraging both labeled and unlabeled data for audio classification tasks.
Loading