Abstract: In this work we analyze stochastic gradient descent (SGD) for learning deep linear neural networks. We use an
analytical approach that combines SGD iterates and gradient flow trajectories base on stochastic approximation theory. Then
establish the almost sure boundedness of SGD iterates and its convergence guarantee for learning deep linear neural networks.
Most studies on the analysis of SGD for nonconvex problem have entirely focused on convergence property which only indicate that
the second moment of the loss function gradient tend to zero. Our study demonstrates the convergence of SGD to a critical point of the square loss almost surely for learning deep linear neural networks.
Submission Type: Abstract
0 Replies
Loading