Linear Weight Interpolation Leads to Transient Performance Gains

Published: 16 Jun 2024, Last Modified: 15 Jul 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Weight Interpolation, Linear Mode Connectivity, Example Importance, Loss Landscapes, Efficient Training
TL;DR: We empirically show that linearly interpolating neural network weights can lead to immediate but temporary performance gains that are lost upon further training and analyze this phenomenon through the lens of example importance.
Abstract: We train copies of a neural network on different sets of SGD noise and find that linearly interpolating their weights can, remarkably, produce networks that perform significantly better than the original networks. However, such interpolated networks consistently end up in unfavorable regions of the optimization landscape: with further training, their performance fails to improve or degrades, effectively undoing the performance gained from the interpolation. We identify two quantities that impact an interpolated network’s performance and relate our observations to linear mode connectivity. Finally, we investigate this phenomenon from the lens of example importance and find that performance improves and degrades almost exclusively on the harder subsets of the training data, while performance is stable on the easier subsets. Our work represents a step towards a better understanding of neural network loss landscapes and weight interpolation.
Student Paper: Yes
Submission Number: 50
Loading