Stochastic algorithms under single spiked models

Emile Richard

Stochastic algorithms under single spiked models

Emile Richard

17 May 2019 (modified: 05 May 2023)Submitted to ICML Deep Phenomena 2019Readers: Everyone

Keywords: stochastic gradient, tensor factorization, principal component analysis, PCA, tensor, SGD, Adam, single spiked model

TL;DR: SGD and Adam under single spiked model for tensor PCA

Abstract: We study SGD and Adam for estimating a rank one signal planted in matrix or tensor noise. The extreme simplicity of the problem setup allows us to isolate the effects of various factors: signal to noise ratio, density of critical points, stochasticity and initialization. We observe a surprising phenomenon: Adam seems to get stuck in local minima as soon as polynomially many critical points appear (matrix case), while SGD escapes those. However, when the number of critical points degenerates to exponentials (tensor case), then both algorithms get trapped. Theory tells us that at fixed SNR the problem becomes intractable for large $d$ and in our experiments SGD does not escape this. We exhibit the benefits of warm starting in those situations. We conclude that in this class of problems, warm starting cannot be replaced by stochasticity in gradients to find the basin of attraction.

1 Reply

Loading