Addressing Attribute Bias with Adversarial Support-Matching

Thomas Kehrenberg; Myles Bartlett; Viktoriia Sharmanska; Novi Quadrianto

Addressing Attribute Bias with Adversarial Support-Matching

Thomas Kehrenberg, Myles Bartlett, Viktoriia Sharmanska, Novi Quadrianto

Published: 14 Mar 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: When trained on diverse labelled data, machine learning models have proven themselves to be a powerful tool in all facets of society. However, due to budget limitations, deliberate or non-deliberate censorship, and other problems during data collection, certain groups may be under-represented in the labelled training set. We investigate a scenario in which the absence of certain data is linked to the second level of a two-level hierarchy in the data. Inspired by the idea of protected attributes from algorithmic fairness, we consider generalised secondary "attributes" which subdivide the classes into smaller partitions. We refer to the partitions defined by the combination of an attribute and a class label, or leaf nodes in aforementioned hierarchy, as groups. To characterise the problem, we introduce the concept of classes with incomplete attribute support. The representational bias in the training set can give rise to spurious correlations between the classes and the attributes which cause standard classification models to generalise poorly to unseen groups. To overcome this bias, we make use of an additional, diverse but unlabelled dataset, called the deployment set, to learn a representation that is invariant to the attributes. This is done by adversarially matching the support of the training and deployment sets in representation space using a set discriminator operating on sets, or bags, of samples. In order to learn the desired invariance, it is paramount that the bags are balanced by class; this is easily achieved for the training set, but requires using semi-supervised clustering for the deployment set. We demonstrate the effectiveness of our method on several datasets and realisations of the problem.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=2UL1Dj2wtK

Changes Since Last Submission: Fixed font problem. We had a left-over `\usepackage{times}` from a previous template that has now been removed. We apologize for the mistake.

Code: https://github.com/wearepal/support-matching

Supplementary Material: zip

Assigned Action Editor: ~Shiyu_Chang2

Submission Number: 1794

Loading