Formalizing Spuriousness of Biased Datasets using Partial Information Decomposition

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Explainability Framework, Spuriousness, Partial Information Decomposition, Blackwell Sufficiency, Auto-encoder, Worst-group Accuracy
TL;DR: This work proposes a novel measure of dataset spuriousness leveraging partial information decomposition.
Abstract: Spuriousness arises when there is an association between two or more variables in a dataset that are not causally related. Left unchecked, they can mislead a machine learning model into using the undesirable spurious features in decision-making over the core features, hindering generalization. In this work, we propose a novel explainability framework to disentangle the nature of such spurious associations, i.e., how the information about a target variable is distributed among the spurious and core features. Our framework leverages a body of work in information theory called Partial Information Decomposition (PID) to first decompose the total information about the target into four non-negative quantities namely unique information (in core and spurious features respectively), redundant information, and synergistic information. Next, we leverage this decomposition to propose a novel measure of the spuriousness of a dataset that steers models into choosing the spurious features over the core. We arrive at this measure systematically by examining several candidate measures, and demonstrating what they capture and miss through intuitive canonical examples and counterexamples. Our proposed explainability framework Spurious Disentangler consists of segmentation, dimensionality reduction, and estimation modules, with capabilities to specifically handle high dimensional image data efficiently. Finally, we also conduct empirical evaluation to demonstrate the trends of unique, redundant, and synergistic information, as well as our proposed spuriousness measure across several benchmark datasets under various settings. Interestingly, we observe a novel tradeoff between our measure of dataset spuriousness and empirical model generalization metrics such as worst-group accuracy, further supporting our proposition.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10472
Loading