On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective

Behrad Moniri; Hamed Hassani

On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective

Behrad Moniri, Hamed Hassani

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: weak-to-strong generalization, deep learning theory, linear regression, feature learning, regularization, ridge penalty, over-parameterization

Abstract: Weak-to-strong generalization—where a student model trained on imperfect labels generated by a weaker teacher nonetheless surpasses that teacher—has been widely observed, but the mechanisms that enable it have remained poorly understood. In this paper, through a theoretical analysis of simple models, we uncover three core mechanisms that can drive this phenomenon. First, by analyzing ridge linear regression, we study the interplay between the teacher and student regularization parameters and prove that a student can compensate for a teacher’s under-regularization and achieve lower test error. We also analyze the role of the parameterization regime of the models and show that qualitatively different phenomena can happen in different regimes. Second, by analyzing weighted ridge linear regression, we show that a student model with a regularization structure better aligned to the target function, can outperform its teacher. Third, in a nonlinear multi‐index learning setting, we demonstrate that a student can learn easy, task-specific features from the teacher while leveraging its own broader pre-training to learn hard‐to‐learn features that the teacher cannot capture.

Supplementary Material: zip

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 9509

Loading