Keywords: Fisher Information, Deep Learning, Information Geometry, Neuromanifold
TL;DR: We bound the variances of random estimators of the diagonal Fisher information matrix.
Abstract: The Fisher information matrix can be used to characterize the local geometry of
the parameter space of neural networks. It elucidates insightful theories and
useful tools to understand and optimize neural networks. Given its high
computational cost, practitioners often use random estimators and evaluate only
the diagonal entries. We examine two popular estimators whose accuracy and sample
complexity depend on their associated variances. We derive bounds of the
variances and instantiate them in neural networks for regression and
classification. We navigate trade-offs for both estimators based on analytical
and numerical studies. We find that the variance quantities depend on the
non-linearity w.r.t. different parameter groups and should not be neglected when
estimating the Fisher information.
Primary Area: Learning theory
Submission Number: 6578
Loading