FTFT: efficient and robust Fine-Tuning by transFerring Training Dynamics

FTFT: efficient and robust Fine-Tuning by transFerring Training Dynamics

ICLR 2024 Workshop DMLR Submission95 Authors

Published: 04 Mar 2024, Last Modified: 02 May 2024DMLR @ ICLR 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: robustness, data pruning, efficiency

Abstract: Despite the massive success of fine-tuning large Pre-trained Language Models (PLMs), they remain susceptible to out-of-distribution and adversarial input. Data Map (DM) is a simple yet effective dual-model approach that improves the robustness of fine-tuned PLMs. It involves fine-tuning a model on the original training set (i.e. reference model), selecting a subset of important training examples based on the training dynamics of the reference model, and fine-tuning the same model only on these selected examples (i.e. main model). However, this approach requires fine-tuning the same model twice, which is computationally expensive forlarge PLMs. In this paper, we show that 1) training dynamics are highly transferable across model sizes and pre-training methods, and that 2) main models fine-tuned using DM learn faster than when using conventional Empirical Risk Minimization (ERM). Building on these observations, we propose a novel fine-tuning approach based on the DM approach: Fine-Tuning by transFerring Training dynamics (FTFT). Compared with DM, FTFT uses more efficient reference models and fewer training steps. FTFT achieves better generalization robustness than ERM while spending less than half of the training cost.

Primary Subject Area: Active learning, Data cleaning, acquisition for ML

Paper Type: Research paper: up to 8 pages

Participation Mode: In-person

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Submission Number: 95

Loading