Gradient descent with identity initialization efficiently learns positive definite linear transformations

Abstract: We analyze algorithms for approximating a function $f(x) = \Phi x$ mapping $\Re^d$ to $\Re^d$ using deep linear neural networks, i.e. that learn a function $h$ parameterized by matrices $\Theta_1,....
0 Replies
Loading