Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retraining, Predicted Labels, Label Noise, Approximate Message Passing
TL;DR: We develop a principled framework using approximate message passing (AMP) to analyze iterative self-retraining of ML models and derive the optimal way to combine the given labels with model predictions at each retraining round.
Abstract: Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model’s performance. While prior works have demonstrated the benefits of specific heuristic retraining schemes, the question of how to optimally combine the model's predictions and the provided labels remains largely open. This paper addresses this fundamental question for binary classification tasks. We develop a principled framework based on approximate message passing (AMP) to analyze iterative retraining procedures for two ground truth settings: Gaussian mixture model (GMM) and generalized linear model (GLM). Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels, which when used to retrain the same model, minimizes its prediction error. We also quantify the performance of this optimal retraining strategy over multiple rounds. We complement our theoretical results by proposing a practically usable version of the theoretically-optimal aggregator function and demonstrate its superiority over baseline methods under different label noise models.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 14687
Loading