Keywords: stochastic gradient, tensor factorization, principal component analysis, PCA, tensor, SGD, Adam, single spiked model
TL;DR: SGD and Adam under single spiked model for tensor PCA
Abstract: We study SGD and Adam for estimating a rank one signal planted in matrix or tensor noise. The extreme simplicity of the problem setup allows us to isolate the effects of various factors: signal to noise ratio, density of critical points, stochasticity and initialization. We observe a surprising phenomenon: Adam seems to get stuck in local minima as soon as polynomially many critical points appear (matrix case), while SGD escapes those. However, when the number of critical points degenerates to exponentials (tensor case), then both algorithms get trapped. Theory tells us that at fixed SNR the problem becomes intractable for large $d$ and in our experiments SGD does not escape this. We exhibit the benefits of warm starting in those situations. We conclude that in this class of problems, warm starting cannot be replaced by stochasticity in gradients to find the basin of attraction.
1 Reply
Loading