Evaluating Shortcut Utilization in Deep Learning Disease Classification through Counterfactual Analysis
Keywords: Shortcuts, counterfactual, Parkinson's disease, bias mitigation
TL;DR: Measuring the extent to which shortcuts are utilized by a deep learning disease classification models through counterfactually removing potential shortcut attributes from the layers.
Abstract: Although deep learning models can surpass human performance in many medical image analysis tasks, they remain vulnerable to algorithmic shortcuts, where spurious correlations in the data are exploited, which may lead to reduced trust in their predictions/classifications. This issue is especially concerning when models rely on protected attributes (e.g., sex, race, or site) as shortcuts. Such shortcut reliance not only impairs their ability to generalize to unseen datasets but also raises fairness concerns, ultimately undermining their purpose for computer-aided diagnosis. Previous techniques for analyzing protected attributes, such as supervised prediction layer information tests, only highlight the presence of protected attributes in the feature space but do not confirm their role in solving the primary task. Determining the impact of protected attributes as shortcuts is particularly challenging, as it requires knowing how a model would perform without those attributes — a counterfactual scenario typically unattainable in real-world data. As a workaround, researchers have addressed the absence of counterfactuals by generating synthetic datasets with and without protected attributes. In this study, we propose a novel approach to evaluate real-world datasets and determine the extent to which each protected attribute is used as a shortcut in a classification task. Therefore, we define and train a causal generative model to produce causally-grounded counterfactuals, removing protected attributes from activations and allowing us to measure their impact on model performance. Employing T1-weighted MRI data from 9 sites (835 subjects: 426 with Parkinson’s disease (PD) and 409 healthy), we demonstrate that counterfactually removing the 'site' attribute from the penultimate layer of a trained classification model reduced the AUROC for PD classification from 0.74 to 0.65, indicating a 9% performance improvement achieved by using 'site' as a shortcut. In contrast, counterfactually removing the 'sex' attribute had minimal impact on performance, with only a slight change of 0.004, indicating that 'sex' was not utilized as a shortcut by the classification model. The proposed method offers a robust framework for assessing shortcut utilization in medical image classification, paving the way for improved bias detection and mitigation in medical imaging tasks. The code for this work is available on https://github.com/vibujithan/shortcut-analysis.
Primary Subject Area: Fairness and Bias
Secondary Subject Area: Causality
Paper Type: Both
Registration Requirement: Yes
Reproducibility: https://github.com/vibujithan/shortcut-analysis
Visa & Travel: Yes
Submission Number: 127
Loading