NPEFF: Non-Negative Per-Example Fisher Factorization

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Interpretability, factorization, fisher information
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: As deep learning models are deployed in more and more settings, it becomes in- creasingly important to be able to understand why they produce a given prediction, but interpretation of these models remains a challenge. In this paper, we introduce a novel interpretability method called NPEFF that is readily applicable to any end-to-end differentiable model. It operates on the principle that processing of a characteristic shared across different examples involves a specific subset of model parameters. We perform NPEFF by decomposing each example’s Fisher infor- mation matrix as a non-negative sum of components. These components take the form of either non-negative vectors or rank-1 positive semi-definite matrices de- pending on whether we are using diagonal or low-rank Fisher representations, re- spectively. For the latter form, we introduce a novel and highly scalable algorithm. We demonstrate that components recovered by NPEFF have interpretable tunings through experiments on language and vision models. Using unique properties of NPEFF’s parameter-space representations, we ran extensive experiments to verify that the connections between directions in parameters space and examples recov- ered by NPEFF actually reflect the model’s processing. We further demonstrate NPEFF’s ability to uncover processing strategies actually used by a model by cre- ating a TRACR-compiled model with known ground truth. We explore a potential applications of NPEFF in uncovering and correcting flawed heuristics used by a model. We release our code to faciliate research using NPEFF.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4277
Loading