AdaFisher: Adaptive Second Order Optimization via Fisher Information

Damien MARTINS GOMES; Yanlei Zhang; Eugene Belilovsky; Guy Wolf; Mahdi S. Hosseini

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Damien MARTINS GOMES, Yanlei Zhang, Eugene Belilovsky, Guy Wolf, Mahdi S. Hosseini

Published: 22 Jan 2025, Last Modified: 17 Feb 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Second Order Optimization, Fisher Information, Kronecker-factored Approximate Curvature, Deep Learning, Computer Vision, Natural Language Processing

TL;DR: This manuscript introduces a robust second-order optimization algorithm for training deep neural networks by approximating Fisher Information from deep layers and incorporating it into an adaptive optimization framework.

Abstract: First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limited curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD. However, their practicality in training DNNs is still limited due to increased per-iteration computations compared to the first-order methods. We present *AdaFisher*--an adaptive second-order optimizer that leverages a *diagonal block-Kronecker* approximation of the Fisher information matrix for adaptive gradient preconditioning. AdaFisher aims to bridge the gap between enhanced *convergence/generalization* capabilities and computational efficiency in second-order optimization framework for training DNNs. Despite the slow pace of second-order optimizers, we showcase that AdaFisher can be reliably adopted for image classification, language modeling and stands out for its stability and robustness in hyper-parameter tuning. We demonstrate that AdaFisher **outperforms the SOTA optimizers** in terms of both accuracy and convergence speed. Code is available from https://github.com/AtlasAnalyticsLab/AdaFisher.

Supplementary Material: zip

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4992

Loading