Keywords: Spurious Correlations, Interpretability
Abstract: Deep neural models often learn to use spurious features in image datasets, which raises concerns when the models are deployed to critical applications, such as medical imaging. Identifying spurious features is essential to developing robust models. Existing methods to find spurious features do not give semantic meaning to the features and rely on human interpretation to decide if they are spurious or not. In this paper, we propose to find spurious visual attributes in the dataset. We first linearly transform the latent features into visual attributes and then learn correlations between the attributes and object classes by training a simple linear classifier. Correlated visual attributes are easily interpretable because they are in natural language having well defined meanings which makes it easier to find if they are spurious or not. Through visualizations and experiments, we show how to find spurious visual attributes, their extent in existing dataset and failure mode examples showing negative impact of learned spurious correlations on out-of-distribution generalization.