Keywords: Scaling Laws, Inference scaling, test time compute, linear models, fine tuning
TL;DR: Our solvable model reveals that inference scaling is training-dependent, directly linking a model's generalization error to its pass@k performance.
Abstract: We analyze neural scaling laws in a solvable model of last-layer fine-tuning where targets have intrinsic, instance-heterogeneous difficulty. In our Latent Instance Difficulty (LID) model, each input's target variance is governed by a latent "precision" drawn from a heavy-tailed distribution. While generalization loss recovers standard scaling laws, our main contribution connects this to inference. The pass@k failure rate exhibits a power-law decay, $k^{-\beta_\mathrm{eff}}$, but the observed exponent $\beta_\mathrm{eff}$ is training-dependent. It grows with sample size $N$ before saturating at an intrinsic limit $\beta$ set by the difficulty distribution's tail. This coupling reveals that learning shrinks the "hard tail" of the error distribution: improvements in the model's generalization error steepen the pass@k curve until irreducible target variance dominates. The LID model yields testable, closed-form predictions for this behavior, including a compute-allocation rule that favors training before saturation and inference attempts after. We validate these predictions in simulations and on CIFAR-10H, where human-label variance provides a realistic difficulty measure.
Primary Area: learning theory
Submission Number: 21098
Loading