Reimplementation of FixMatch and Investigation on Noisy (Pseudo) Labels and Confirmation Errors of FixMatch
Abstract: Scope of Reproducibility.The main objective of this work is to confirm the effectiveness of FixMatch (Sohn et al.[2020]), which combines pseudo labeling and consistency regularization in semi-supervised learning (SSL) tasks, by achieving similar results on CIFAR-10 and demonstrating the key success of FixMatch via ablation studies. Furthermore, we also investigated the existence of confirmation errors in FixMatch by reconstructing the batch structure during the training process.
Methodology
All the experiments in this work were conducted on CIFAR-10 using the same network architecture, Wide ResNet28-2. A single V100 is used for each experiment with an average training time of 70 hours. We re-implemented FixMatch mainly based on the paper using Pytorch and refer to the official implementation (in Tensorflow)for details and replicated similar results shown in the second-last row of Table 2 of column CIFAR-10 in Sohn et al.[2020]. Ablation studies were focused on two key factors of FixMatch, the ratio of unlabeled data, and confidence threshold, as shown in Figure 3 (a) & (b) in Sohn et al. [2020].
Results
Compared with the average error rate reported in Table 2 in Sohn et al. [2020], our implementation achieves similar error rates by3.77lower on CIFAR-10 with 40 labels,0.22higher on CIFAR-10 with 250 labels, and0.1higheron CIFAR-10 with 4000 labels. Thus it is supported that FixMatch outperforms semi-supervised learning benchmarks. And the results of ablation studies exhibit almost the same trends as Figure 3 (a) & (b) show in the paper, which demonstrated that the author’s choices with respect to those ablations were experimentally sound. We also confirmed the existence of confirmation errors in pseudo labels by checking the confusion matrix of the prediction of unlabeled data in different training stages.
What was easy.
It is generally easy to re-implement FixMatch given all the experimental settings in the paper, with key parameters clearly stated in each experimental section and detailed lists of hyperparameters in the appendix. Compared with CTAugment, RandAugment is relatively easy to implement since it requires no parameters tuning during training, and coefficients representing the severity of all distortions are given in appendix. Besides, it converges faster than augment.
What was difficult
The official implementation is complicated thus not easy to follow. And there are some details missing in the paper compared to the code: 1. the official implementation actually uses leaky ReLU instead of ReLUfor ResNet; 2. The exponential moving average is only mentioned for experiments on ImageNet but actually also used onCIFAR-10; 3. the details on how to update the weights of the magnitude bins of CTAugment are not given in the paper, and our implementation achieves slightly worse results than the average error rate reported (1.14 higher with 250labels).
Communication with original authors
All the confusing parts mentioned in the previous section are clarified by the original authors via email and in the issues of their Github repository。
Paper Url: https://openreview.net/forum?id=bleIdqV_-JY
4 Replies
Loading