TL;DR: A theoretical analysis of W2S from an intrinsic dimension perspective, unveiling the role of teacher-student discrepancies and the sample complexities.
Abstract: Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student-weak teacher pair with sufficiently expressive low-dimensional feature subspaces $\mathcal{V}_s, \mathcal{V}_w$, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in $\mathcal{V}_s \cap \mathcal{V}_w$, while reduced by a factor of $\mathrm{dim}(\mathcal{V}_s)/N$ in the subspace of discrepancy $\mathcal{V}_w \setminus \mathcal{V}_s$ with $N$ pseudo-labels for W2S. Our analysis further casts light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported by experiments on synthetic regression problems, as well as real vision and NLP tasks.
Lay Summary: When fine‑tuning a strong, pretrained student on pseudo‑labels produced by a separately fine‑tuned weak teacher, the student often ends up outperforming its teacher—an effect known as weak‑to‑strong (W2S) generalization. How can this happen when both models have more than enough capacity to learn the true data distribution? We provide a precise answer from the variance reduction perspective.
Since finetuning tends to fall in the kernel regime and admit a low intrinsic dimension, we model both weak teacher and strong student as high-dimensional features operating in their respective low‑dimensional subspaces. In the regression setting, we provide an exact characterization of the W2S variance that dominates the generalization error. Our analysis unveils that the larger discrepancy between the weak and strong feature subspaces brings better W2S performance. Intuitively, this is because the pseudo‑label errors coming from teacher features absent in the student subspace act like independent label noise—and that noise is reduced in proportion to $1/N$, with $N$ being the pseudo‑label size.
Primary Area: Theory->Learning Theory
Keywords: Weak-to-Strong generalization, Intrinsic dimension, Finetuning
Submission Number: 4231
Loading