A Coordinate-Free Construction of Scalable Natural Gradient

Kevin Luk; Roger Grosse

A Coordinate-Free Construction of Scalable Natural Gradient

Kevin Luk, Roger Grosse

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We explicitly construct a Riemannian metric under which the natural gradient matches the K-FAC update; exact affine invariances follows immediately.

Abstract: Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model. Natural gradient descent is invariant to smooth reparameterizations because it is defined in a coordinate-free way, but tractable approximations are typically defined in terms of coordinate systems, and hence may lose the invariance properties. We analyze the invariance properties of the Kronecker-Factored Approximate Curvature (K-FAC) algorithm by constructing the algorithm in a coordinate-free way. We explicitly construct a Riemannian metric under which the natural gradient matches the K-FAC update; invariance to affine transformations of the activations follows immediately. We extend our framework to analyze the invariance properties of K-FAC appied to convolutional networks and recurrent neural networks, as well as metrics other than the usual Fisher metric.

Keywords: Natural gradient, second-order optimization, K-FAC, parameterization invariance, deep learning

Original Pdf: pdf

8 Replies

Loading