Learning from setbacks: the impact of adversarial initialization on generalization performance

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX
Keywords: adversarial initialization, generalization measures, loss landscape
TL;DR: SGD can apparently reach bad minima, but how? We posit some explanations and study them.
Abstract: The loss landscape of state-of-the-art neural networks is far from simple. Understanding how optimization algorithms initialized differently navigate such high-dimensional non-convex profiles is a key problem in machine learning. [Liu et al. 2020] use pre-training on random labels to produce adversarial initializations that lead stochastic gradient descent into global minima with poor generalization. This result contrasts with other literature arguing that pre-training on random labels produces positive effects (see, e.g., [Maennel et al. (2020)]). We ask under which conditions this initialization results in solutions that generalize poorly. Our goal is to build a theoretical understanding of the properties of good solutions by isolating this phenomenon in some minimal models. To this end, we posit and study several hypotheses for why the phenomenon might arise in models of varying levels of simplicity, including representation quality and complex structure in data.
Submission Number: 73