- TL;DR: We explicitly construct a Riemannian metric under which the natural gradient matches the K-FAC update; exact affine invariances follows immediately.
- Abstract: Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model. Natural gradient descent is invariant to smooth reparameterizations because it is defined in a coordinate-free way, but tractable approximations are typically defined in terms of coordinate systems, and hence may lose the invariance properties. We analyze the invariance properties of the Kronecker-Factored Approximate Curvature (K-FAC) algorithm by constructing the algorithm in a coordinate-free way. We explicitly construct a Riemannian metric under which the natural gradient matches the K-FAC update; invariance to affine transformations of the activations follows immediately. We extend our framework to analyze the invariance properties of K-FAC appied to convolutional networks and recurrent neural networks, as well as metrics other than the usual Fisher metric.
- Keywords: Natural gradient, second-order optimization, K-FAC, parameterization invariance, deep learning