Abstract: Semi-supervised learning (SSL) digs unlabeled data through pseudo-labeling when labeled data is limited. Despite various auxiliary strategies to enhance SSL training, the main challenge lies in how to determine reliable pseudo labels with a robust thresholding algorithm based on quality indicators (\textit{e.g.}, confidence scores).However, the latest methods for distinguishing low or high-quality labels require complex-designed thresholding strategies but still fail to guarantee robust and efficient selection. Empirically, we group the quality indicators of pseudo labels into three clusters (easy, semi-hard, and hard) and statistically reveal the real bottleneck of threshold selection lying in the sensitivity of separating semi-hard samples. To this end, we propose an adaptive \textbf{G}rouping and \textbf{T}ransporting for \textbf{R}obust thresholding (dubbed as GTR) that efficiently selects semi-hard samples with test-time augmentations and consistency constraints while saving the selection budgets of easy and hard samples. Our proposed GTR can effectively determine high-quality data when applied to existing SSL methods while reducing redundant selection costs. Extensive experiments on eleven SSL benchmarks across three modalities verify that GTR achieves significant performance gains and speedups over Pseudo Label, FixMatch, and FlexMatch.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=vLrm8tEJnI
Changes Since Last Submission: Removing identity of an author.
Assigned Action Editor: ~Lei_Wang13
Submission Number: 5199
Loading