Rethinking Confidence Scores and Thresholds in Pseudolabeling-based SSL

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We address overconfidence and miscalibration in pseudolabeling-based SSL with a framework for learning scores and thresholds with explicit error control. This boosts pseudolabel quality and quantity, enhancing accuracy and training efficiency in SSL.
Abstract: Modern semi-supervised learning (SSL) methods rely on pseudolabeling and consistency regularization. Pseudolabeling is typically performed by comparing the model's confidence scores and a predefined threshold. While several heuristics have been proposed to improve threshold selection, the underlying issues of overconfidence and miscalibration in confidence scores remain largely unaddressed, leading to inaccurate pseudolabels, degraded test accuracy, and prolonged training. We take a first-principles approach to learn confidence scores and thresholds with an explicit knob for error. This flexible framework addresses the fundamental question of optimal scores and threshold selection in pseudolabeling. Moreover, it gives practitioners a principled way to control the quality and quantity of pseudolabels. Such control is vital in SSL, where balancing pseudolabel quality and quantity directly affects model performance and training efficiency. Our experiments show that, by integrating this framework with modern SSL methods, we achieve significant improvements in accuracy and training efficiency. In addition, we provide novel insights on the trade-offs between the choices of the error parameter and the end model's performance.
Lay Summary: Modern AI systems often learn from a mix of labeled and unlabeled data. A common approach is to let the model guess labels for the unlabeled data, a process called pseudolabeling, and then train on those guesses (pseudolabels). But deciding which pseudolabels to trust is tricky. Most methods rely on the model’s confidence, using a fixed rule: if the confidence is above a certain threshold, accept the guess. Unfortunately, ad hoc choices of confidence scores and thresholds can be unreliable, leading to many wrong guesses and inefficient training. In our work, we take a more principled approach. Instead of using the common choices, we introduce a procedure to obtain better confidence scores and thresholds that reflect how much error you're willing to tolerate while pseudolabeling as many points as possible. This gives users direct control over the trade-off between making more guesses and making better guesses — a key challenge in this type of learning. When added to existing methods, our approach improves both accuracy and provides new insights into how this balance affects final performance.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/harit7/PabLO-SSL
Primary Area: General Machine Learning->Unsupervised and Semi-supervised Learning
Keywords: Semi-supervised Learning, Pseudolabeling, Self-Training, Confidence Functions
Submission Number: 8561
Loading