Approximations to the Fisher Information Metric of Deep Generative Models for Out-Of-Distribution Detection

Sam Dauncey; Christopher C. Holmes; Christopher Williams; Fabian Falck

Approximations to the Fisher Information Metric of Deep Generative Models for Out-Of-Distribution Detection

Sam Dauncey, Christopher C. Holmes, Christopher Williams, Fabian Falck

Published: 28 Oct 2024, Last Modified: 28 Oct 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Likelihood-based deep generative models such as score-based diffusion models and variational autoencoders are state-of-the-art machine learning models approximating high-dimensional distributions of data such as images, text, or audio. One of many downstream tasks they can be naturally applied to is out-of-distribution (OOD) detection. However, seminal work by Nalisnick et al. which we reproduce showed that deep generative models consistently infer higher log-likelihoods for OOD data than data they were trained on, marking an open problem. In this work, we analyse using the gradient of a data point with respect to the parameters of the deep generative model for OOD detection, based on the simple intuition that OOD data should have larger gradient norms than training data. We formalise measuring the size of the gradient as approximating the Fisher information metric. We show that the Fisher information matrix (FIM) has large absolute diagonal values, motivating the use of chi-square distributed, layer-wise gradient norms as features. We combine these features to make a simple, model-agnostic and hyperparameter-free method for OOD detection which estimates the joint density of the layer-wise gradient norms for a given data point. We find that these layer-wise gradient norms are weakly correlated, rendering their combined usage informative, and prove that the layer-wise gradient norms satisfy the principle of (data representation) invariance. Our empirical results indicate that this method outperforms the Typicality test for most deep generative models and image dataset pairings.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: - $\S 1$: fixed typo "...In $\S 2.2$ we will analyse..." $\mapsto$ "...In $\S 4$ we will analyse..." - $\S 1$: Added citations "Nguyen et al. (2019); Kwon et al. (2020); Choi et al. (2021); Bergamin et al. (2022)." - $\S 2.2$: fixed typo “negative leaned likelihood” - $\S 2.2$: Added explanations of Gaussian annulus result and link between arithmetic coding length and likelihood. - $\S 2.2$: Revised explanation of difficulty of using likelihood ratios. - $\S 3.1$: Added explanation of intuition of the $L^2$ norm of the gradient as a directional derivative. - $\S 4$: Revised description of Nalisnick et al. (2019b) Appendix D. - $\S 4$: Revised discussion surrounding squared gradients and FIM prior nullification. - $\S 4$: Ammended statement "diffusion models do not allow for exact inference of the log-likelihood” - $\S 5$: Revised wording in surrounding $\texttt{CIFAR-10}$ and $\texttt{ImageNet-32}$ baselines. - Appendix A.2: fixed typos: “variatonal”, “then, ELBO is invariant” - Appendix A.3: Added accompanying "snake pattern" figure. - Appendix A.3: fixed typo of missing "$< \alpha$" - Added Appendix A.5, discussing the difference to classical invariance results.

Code: https://github.com/SamD770/Generative-Models-Knowledge

Supplementary Material: zip

Assigned Action Editor: ~Daniel_M_Roy1

Submission Number: 2156

Loading