Keywords: robustness, data pruning, efficiency
Abstract: Despite the massive success of fine-tuning large Pre-trained Language Models (PLMs), they remain susceptible to out-of-distribution and adversarial input. Data Map (DM) is a simple yet effective dual-model approach that improves the robustness of fine-tuned PLMs. It involves fine-tuning a model on the original training set (i.e. reference model), selecting a subset of important training examples based on the training dynamics of the reference model, and fine-tuning the same model only on these selected examples (i.e. main model). However, this approach requires fine-tuning the same model twice, which is computationally expensive forlarge PLMs. In this paper, we show that 1) training dynamics are highly transferable across model sizes and pre-training methods, and that 2) main models fine-tuned using DM learn faster than when using conventional Empirical Risk Minimization (ERM). Building on these observations, we propose a novel fine-tuning approach based on the DM approach: Fine-Tuning by transFerring Training dynamics (FTFT). Compared with DM, FTFT uses more efficient reference models and fewer training steps. FTFT achieves better generalization robustness than ERM while spending less than half of the training cost.
Primary Subject Area: Active learning, Data cleaning, acquisition for ML
Paper Type: Research paper: up to 8 pages
Participation Mode: In-person
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Submission Number: 95
Loading