Balanced and Accurate Pseudo-Labels for Semi-Supervised Image Classification

Jian Zhao

Published: 01 Oct 2022, Last Modified: 14 Apr 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Image classification by semi-supervised learning has recently become a hot spot, and the Co-Training framework is an important method of semi-supervised image classification. In the traditional Co-Training structure, the sub-networks will generate pseudo-labels for each other, and these pseudo-labels will further be used as a supervisory signal for model training. However, the pseudo-labels will hurt classification performance because of their low accuracy and unbalanced distribution. In this article, we are trying to solve the preceding two problems by designing the Balanced Module (BM) and Gaussian Mixture Module (GMM), and propose BAPS (the Balanced and Accurate Pseudo-labels for Semi-supervised image classification). In BM, the two sub-networks jointly predict the unlabeled images, then select the pseudo-labels with a high-confidence threshold to perform the balancing operation to obtain the initial samples with balanced distribution of each category. In GMM, referring to the common practice of the Learning from Noise Labels task, we use GMM to fit the loss distribution of images with pseudo-labels output by BM, then clean samples and noise samples are divided based on the observation that the loss of correctly labeled images is generally smaller than that of wrongly labeled ones. Through BM and GMM, pseudo-labels with balanced distribution and high accuracy are obtained for the subsequent model training process. Our model has achieved better classification accuracy than most state-of-the-art semi-supervised image classification algorithms on the CIFAR-10/100 and SVHN datasets, and further ablation experiments demonstrate the effectiveness of our BAPS. The source code of BAPS will be available at https://github.com/zhaojianaaa.