When Scale is Fixed: Revisiting Pre-training Indicators for LLM Fine-tuning Performance

When Scale is Fixed: Revisiting Pre-training Indicators for LLM Fine-tuning Performance

ICLR 2026 Conference Submission13727 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language models pre-training, downstream performance prediction, prediction proxy mining

Abstract: While scaling laws tell us that metrics like perplexity effectively indicate how a model performs as it grows, we still don't fully grasp its predictive power at a fixed size. This lack of clarity makes it challenging to conduct effective ablation studies on smaller models, for example, when trying out various pre-training objectives. Since a primary application for these pre-trained models is supervised fine-tuning (SFT) on specific data or tasks, it's crucial for our ablation studies to connect this post-SFT performance back to the initial pre-training choices. This helps us conduct more effective pre-training research. To study this problem, we first construct a dataset using 50 1B parameter LLM variants with systematically varied pre-training configurations, e.g., objectives or data, and evaluate them on diverse downstream tasks after supervised fine-tuning (SFT). We demonstrate that the conventional perplexity is a highly misleading indicator in this scenario. To address this gap, we formulate the task of selecting pre-training checkpoints to maximize downstream fine-tuning performance as a pairwise classification problem: predicting which of two LLMs, differing in their pre-training, will perform better after SFT. We introduce novel unsupervised and supervised proxy metrics derived from pre-training that successfully reduce the relative performance prediction error rate by over 50% when comparing with existing methods. Despite the inherent complexity of this task, we demonstrate the practical utility of our proposed proxies in specific scenarios, paving the way for more efficient design of pre-training schemes optimized for various downstream tasks.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 13727

Loading