TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining

Published: 12 Jun 2025, Last Modified: 12 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large-scale general domain pretraining followed by downstream-specific finetuning has become a predominant paradigm in machine learning. However, discrepancies between the pretraining and target domains can still lead to performance degradation in certain cases, underscoring the need for task-adaptive continued pretraining (TAP). TAP methods typically involve continued pretraining on task-specific unlabeled datasets or introducing additional unsupervised learning objectives to enhance model capabilities. While many TAP methods perform continued pretraining with multiple pretraining objectives, they often determine the tradeoff parameters between objectives manually, resulting in suboptimal outcomes and higher computational costs. In this paper, we propose TapWeight, a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective based on downstream feedback. TapWeight reweights each pretraining objective by solving a multi-level optimization problem. We applied TapWeight to both molecular property prediction and natural language processing tasks, significantly surpassing baseline methods. Experimental results validate the effectiveness and generalizability of TapWeight.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission:
  1. Added Algorithm 1 (“TapWeight-optimization”) in Section 3.3.
  2. Discussed the related work on multi-task learning in Section 2.3.
  3. Added description of constraints on the sum of $\lambda$ in Section 3.2.
  4. Included results for DeBERTa-v3-large in Table 5 and Section 4.2.3.
  5. Reported CP performance with learned weights in Table 7 and Section 4.3.
  6. Added training-cost comparisons for additional baselines in Table 8.
  7. Provided a complexity analysis of TapWeight in Appendix B.
  8. Discussed proximal-regularization loss selection in Appendix E.
  9. Clarified terminology and naming throughout the manuscript, including:
    • Changed “stage” to “level”.
    • Explained “scaffold splitting”.
    • Standardized the name of the direct finetuning baseline.
Supplementary Material: zip
Assigned Action Editor: Simon Kornblith
Submission Number: 4554
Loading