Abstract: Image classification by semi-supervised learning has recently become a hot spot, and the Co-Training framework is an important method of semi-supervised image classification. In the traditional Co-Training structure, the sub-networks will generate pseudo-labels for each other, and these pseudo-labels will further be
used as a supervisory signal for model training. However, the pseudo-labels will hurt classification performance because of their low accuracy and unbalanced distribution. In this article, we are trying to solve the
preceding two problems by designing the Balanced Module (BM) and Gaussian Mixture Module (GMM), and
propose BAPS (the Balanced and Accurate Pseudo-labels for Semi-supervised image classification). In BM, the
two sub-networks jointly predict the unlabeled images, then select the pseudo-labels with a high-confidence
threshold to perform the balancing operation to obtain the initial samples with balanced distribution of each
category. In GMM, referring to the common practice of the Learning from Noise Labels task, we use GMM
to fit the loss distribution of images with pseudo-labels output by BM, then clean samples and noise samples
are divided based on the observation that the loss of correctly labeled images is generally smaller than that of
wrongly labeled ones. Through BM and GMM, pseudo-labels with balanced distribution and high accuracy
are obtained for the subsequent model training process. Our model has achieved better classification accuracy
than most state-of-the-art semi-supervised image classification algorithms on the CIFAR-10/100 and SVHN
datasets, and further ablation experiments demonstrate the effectiveness of our BAPS. The source code of
BAPS will be available at https://github.com/zhaojianaaa.
Loading