Deep Learning Optimization Theory - Trajectory Analysis of Gradient Descent
Keywords: deep learning theory, trajectory analysis, neural tangent kernel
Abstract: In recent years an obvious yet mysterious fact that stood across various experiments is the ability of gradient descent, a relatively simple first-order optimization method, to optimize an enormous number of parameters on highly non-convex loss functions. In some sense, this practical observation stands in contrast to classical statistical learning theory. This post will discuss the significant progress researchers are making in bridging this theory gap and demystifying gradient descent.
Submission Full: zip
Blogpost Url: yml
ICLR Paper: https://arxiv.org/pdf/1810.02281.pdf, https://arxiv.org/pdf/1810.02054.pdf
2 Replies
Loading