Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To Reduce Model Bias

Sharat Agarwal, Sumanyu Muku, Saket Anand, Chetan Arora

2022 (modified: 03 Nov 2022)WACV 2022Readers: Everyone

Abstract: Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy. However, co-occurrence bias in the training dataset may hamper a DNNmodel’s generalizabil- ity to unseen scenarios in the real world. For example, in COCO [26], many object categories have a much higher cooccurrence with men compared to women, which can bias a DNN’s prediction in favor of men. Recent works have focused on task-specific training strategies to handle bias in such scenarios, but fixing the available data is often ignored. In this paper, we propose a novel and more generic solution to address the contextual bias in the datasets by selecting a subset of the samples, which is fair in terms of the co-occurrence with various classes for a protected attribute. We introduce a data repair algorithm using the coefficient of variation( c <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">v</inf> ), which can curate fair and contextually balanced data for a protected class(es). This helps in training a fair model irrespective of the task, architecture or training methodology. Our proposed solution is simple, effective and can even be used in an active learning setting where the data labels are not present or being generated incrementally. We demonstrate the effectiveness of our algorithm for the task of object detection and multi-label image classification across different datasets. Through a series of experiments, we validate that curating contextually fair data helps make model predictions fair by balancing the true positive rate for the protected class across groups without compromising on the model’s overall performance. Code: https://github.com/sumanyumuku98/contextual-bias

0 Replies