Trajectory ensembling for fine tuning - performance gains without modifying trainingDownload PDF

Published: 20 Oct 2022, Last Modified: 05 May 2023HITY Workshop NeurIPS 2022Readers: Everyone
Keywords: Ensemble learning, Transfer learning
Abstract: In this work, we present a simple algorithm for ensembling checkpoints from a single training trajectory (trajectory ensembling) resulting in significant gains for several fine tuning tasks. We compare against classical ensembles and perform ablation studies showing that the important checkpoints are not necessarily the best performing models in terms of accuracy. Rather, relatively poor models with low loss are vital for the observed performance gains. We also investigate various mixtures of checkpoints from several independent training trajectories, making the surprising observation that this only leads to marginal gains in this setup. We study how calibrating constituent models with a simple temperature scaling impacts results, and find that the most important region of training is still that of the lowest loss in spite of potential poor accuracy.
3 Replies