Abstract: Training data often contains spurious correlations not reflected in the real world.
Nevertheless, machine learning systems
can become over-reliant on these correlations for classifications, instead of basing their classification on those which are semantically meaningful.
This is undesirable for many reasons, among them limited generalisability.
We consider two datasets that exhibit spurious correlations: coloured MNIST
and the UCI Adult dataset.
To solve this problem, we propose dividing training into a straightforward and general two-step procedure in which the model is first trained to produce
invariant representations from an unlabelled pre-training set, in which there exists minimal spurious correlations, and a second step in which a classifier is trained on the encodings generated for the biased training set. We leverage recent developments in flow-based modelling
to ensure preservation of all class-relevant information. Using our proposed schema, we show success in decorrelating the spurious and semantic features on both datasets.
0 Replies
Loading