PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

Etienne Goffinet; Shane Bergsma; Avraham Sheinin; Natalia Vassilieva; Preslav Nakov; Gurpreet Gosal

PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

Etienne Goffinet, Shane Bergsma, Avraham Sheinin, Natalia Vassilieva, Preslav Nakov, Gurpreet Gosal

Published: 23 Sept 2025, Last Modified: 11 Nov 2025CCFM PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: CPT, D-CPT, Continual, pre-training, adaptation, domain-adaptation, language-adaptation

TL;DR: We present pre-training compute budget aware domain-adaptation scaling laws. It makes the target domain performance predictable with an explicit variable in the functional form to account for the source domain training compute budget.

Abstract: Continual pre-training (CPT) for domain adaptation must balance target-domain gains with stability on the base domain. Existing CPT scaling laws typically assume a fixed pre-training budget, which limits their ability to forecast adaptation outcomes for models trained at different tokens-per-parameter (PTPP). We present PTPP-aware adaptation scaling laws that make the pre-training budget an explicit variable, enabling accurate prediction of adaptation loss at unseen P TPP. On a multilingual setup (English/Arabic → French), PTPP-aware formulations trained on early stages (P TPP={15,31}) predict target loss at P TPP=279 and outperform a PTPP-agnostic D-CPT transfer baseline on compact metrics (Huber-on-log, MAErel, calibration slope); full diagnostics (RMSE, MAPE) are in the appendix. Beyond forecasting, we show a practical use case: planning replay ratios and adaptation token budgets that satisfy target and forgetting constraints under compute limits.

Serve As Reviewer: ~Etienne_Goffinet3

Submission Number: 47

Loading