Gradients of Functions of Large Matrices

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 spotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Automatic differentiation, numerical methods, linear algebra, implicit differentiation, adjoint methods, differential equations, Bayesian neural networks, Gaussian processes
TL;DR: We derive previously unknown gradients of Lanczos and Arnoldi iterations and use them for PDEs, Gaussian processes, and Bayesian neural networks.
Abstract: Tuning scientific and probabilistic machine learning models - for example, partial differential equations, Gaussian processes, or Bayesian neural networks - often relies on evaluating functions of matrices whose size grows with the data set or the number of parameters. While the state-of-the-art for _evaluating_ these quantities is almost always based on Lanczos and Arnoldi iterations, the present work is the first to explain how to _differentiate_ these workhorses of numerical linear algebra efficiently. To get there, we derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax when it comes to differentiating PDEs, GPyTorch for selecting Gaussian process models and beats standard factorisation methods for calibrating Bayesian neural networks. All this is achieved without any problem-specific code optimisation. Find the code at [link redacted] and install the library with *pip install [redacted]*.
Supplementary Material: zip
Primary Area: Machine learning for physical sciences (for example: climate, physics)
Submission Number: 8674
Loading