No Shortcuts: Removing Spurious Correlations with Invertible Neural Networks

Thomas Kehrenberg, Myles Bartlett, Oliver Thomas, Novi Quadrianto

24 Sept 2019OpenReview Archive Direct UploadReaders: Everyone

Abstract: Training data often contains spurious correlations not reflected in the real world. Nevertheless, machine learning systems can become over-reliant on these correlations for classifications, instead of basing their classification on those which are semantically meaningful. This is undesirable for many reasons, among them limited generalisability. We consider two datasets that exhibit spurious correlations: coloured MNIST and the UCI Adult dataset. To solve this problem, we propose dividing training into a straightforward and general two-step procedure in which the model is first trained to produce invariant representations from an unlabelled pre-training set, in which there exists minimal spurious correlations, and a second step in which a classifier is trained on the encodings generated for the biased training set. We leverage recent developments in flow-based modelling to ensure preservation of all class-relevant information. Using our proposed schema, we show success in decorrelating the spurious and semantic features on both datasets.

0 Replies