Trade-Offs of Diagonal Fisher Information Matrix Estimators

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Fisher Information, Deep Learning, Information Geometry, Neuromanifold
TL;DR: We bound the variances of random estimators of the diagonal Fisher information matrix.
Abstract: The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks. It elucidates insightful theories and useful tools to understand and optimize neural networks. Given its high computational cost, practitioners often use random estimators and evaluate only the diagonal entries. We examine two popular estimators whose accuracy and sample complexity depend on their associated variances. We derive bounds of the variances and instantiate them in neural networks for regression and classification. We navigate trade-offs for both estimators based on analytical and numerical studies. We find that the variance quantities depend on the non-linearity w.r.t. different parameter groups and should not be neglected when estimating the Fisher information.
Primary Area: Learning theory
Submission Number: 6578
Loading