Faking Interpolation Until You Make ItDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Deep Learning, Optimisation, Step-size selection
Abstract: Deep over-parameterized neural networks exhibit the interpolation property on many data sets. That is, these models are able to achieve approximately zero loss on all training samples simultaneously. Recently, this property has been exploited to develop novel optimisation algorithms for this setting. These algorithms use the fact that the optimal loss value is known to employ a variation of a Polyak step-size calculated on a stochastic batch of data. We introduce a novel extension of this idea to tasks where the interpolation property does not hold. As we no longer have access to the optimal loss values a priori, we instead estimate these for each sample online. To realise this, we introduce a simple but highly effective heuristic for approximating the optimal value based on previous loss evaluations. This heuristic starts by setting the approximate optimal values to a known lower bound on the loss function, typically zero. It then updates them at fixed intervals through training in the direction of the best iterate visited so far. We provide rigorous experimentation on a wide range of problems including two natural language processing tasks, popular vision benchmarks and the challenging ImageNet classification data set. From our empirical analysis we demonstrate the effectiveness of our approach, which in the non-interpolating setting, outperforms state of the art baselines, namely adaptive gradient and line search methods.
One-sentence Summary: We present an optimisation method that uses a Polyak Step-size in combination with a heuristic to approximate online the optimal function value for each example in the training set.
Supplementary Material: zip
5 Replies

Loading