Trajectory ensembling for fine tuning -  performance gains without modifying training

Louise Anderson-Conway; Vighnesh Birodkar; Saurabh Singh; Hossein Mobahi; Alexander A Alemi

Trajectory ensembling for fine tuning - performance gains without modifying training

Louise Anderson-Conway, Vighnesh Birodkar, Saurabh Singh, Hossein Mobahi, Alexander A Alemi

Published: 20 Oct 2022, Last Modified: 05 May 2023HITY Workshop NeurIPS 2022Readers: Everyone

Keywords: Ensemble learning, Transfer learning

Abstract: In this work, we present a simple algorithm for ensembling checkpoints from a single training trajectory (trajectory ensembling) resulting in significant gains for several fine tuning tasks. We compare against classical ensembles and perform ablation studies showing that the important checkpoints are not necessarily the best performing models in terms of accuracy. Rather, relatively poor models with low loss are vital for the observed performance gains. We also investigate various mixtures of checkpoints from several independent training trajectories, making the surprising observation that this only leads to marginal gains in this setup. We study how calibrating constituent models with a simple temperature scaling impacts results, and find that the most important region of training is still that of the lowest loss in spite of potential poor accuracy.

3 Replies

Loading