Deep Learning Optimization Theory - Trajectory Analysis of Gradient Descent

Anonymous

Deep Learning Optimization Theory - Trajectory Analysis of Gradient Descent

Anonymous

17 Jan 2022 (modified: 05 May 2023)Submitted to BT@ICLR2022Readers: Everyone

Keywords: deep learning theory, trajectory analysis, neural tangent kernel

Abstract: In recent years an obvious yet mysterious fact that stood across various experiments is the ability of gradient descent, a relatively simple first-order optimization method, to optimize an enormous number of parameters on highly non-convex loss functions. In some sense, this practical observation stands in contrast to classical statistical learning theory. This post will discuss the significant progress researchers are making in bridging this theory gap and demystifying gradient descent.

Submission Full: zip

Blogpost Url: yml

ICLR Paper: https://arxiv.org/pdf/1810.02281.pdf, https://arxiv.org/pdf/1810.02054.pdf

2 Replies

Loading