Keywords: evaluation, correlation, language model
TL;DR: We investigate how language model capabilities transfer from pretraining to supervised fine-tuning (SFT) using a set of correlation analysis protocols, across both accuracy and confidence metrics.
Abstract: Understanding how language model capabilities transfer from pretraining to supervised fine-tuning (SFT) is fundamental to efficient model development and data curation. In this work, we investigate four core questions:
**RQ1**: To what extent do accuracy and confidence rankings established during pretraining persist after SFT?
**RQ2**: Which benchmarks serve as robust cross-stage predictors and which are unreliable?
**RQ3**: How do transfer dynamics shift with model scale?
**RQ4**: How well does model confidence align with accuracy, as a measure of calibration quality? Does this alignment pattern transfer across training stages? Our experiments reveal that transfer reliability varies dramatically across capability categories, benchmarks, and scales---with accuracy and confidence exhibiting distinct, sometimes opposing, scaling dynamics. These findings shed light on the complex interplay between pretraining decisions and downstream outcomes, providing actionable guidance for benchmark selection, data curation, and efficient model development.
Submission Number: 106
Loading