Keywords: Fairness, Demographic Bias, Chest X-ray, Shortcut Learning, Distribution Shift
Abstract: As machine learning models reach human level performance on many real-world medical imaging tasks, it is crucial to consider the mechanisms they may be using to make such predictions. Prior work has demonstrated the surprising ability of deep learning models to recover demographic information from chest X-rays. This suggests that disease classification models could potentially be utilizing these demographics as shortcuts, leading to prior observed performance gaps between demographic groups. In this work, we start by investigating whether chest X-ray models indeed use demographic information as shortcuts when classifying four different diseases. Next, we apply five existing methods for tackling spurious correlations, and examine performance and fairness both for the original dataset and five external hospitals. Our results indicate that shortcut learning can be corrected to remedy in-distribution fairness gaps, though this reduction often does not transfer under domain shift. We also find trade-offs between fairness and other important metrics, raising the question of whether it is beneficial to remove such shortcuts in the first place.
Submission Number: 35
Loading