FairCrowd: Fair Human Face Dataset Sampling via Batch-Level Crowdsourcing Bias Inference

Ziyi Kou, Yang Zhang, Lanyu Shang, Dong Wang

2021 (modified: 21 Jan 2023)IWQoS 2021Readers: Everyone

Abstract: Human face image is a large category of visual information utilized by various human facial data services (e.g., face recognition, face generation, face attribute prediction). However, the quality of data services (QoDS) on human face datasets is usually biased towards the majority demographic group due to the data imbalance issue. In this paper, we focus on a fair human face dataset sampling problem where the goal is to sample a sub-dataset from the original dataset to reduce its bias by leveraging crowd intelligence to infer the demographic labels of face images (e.g., male or female, old or young). Our problem is motivated by the limitations of current fair data sampling solutions that require pre-annotated demographic labels to sample a fair dataset. Two important challenges exist in solving our problem: 1) it is extremely time-consuming and expensive to assign crowd workers to annotate demographic labels of all images in a large-scale facial dataset; 2) it is not a trivial task to improve the fairness of the sampled sub-dataset (with fewer data samples) without sacrificing the accuracy performance of data services on such dataset. To address the above challenges, we develop FairCrowd, a fair crowdsourcing-based data sampling framework that leverages an efficient batch-level demographic label inference model and a joint fair-accuracy-aware data shuffling method. We evaluate the performance of FairCrowd through a large-scale real-world face image dataset that consists of celebrity faces from a diversified set of demographic groups. The results show that FairCrowd not only reduces demographic bias but also improves the accuracy of data services trained on the sub-dataset generated by FairCrowd, leading to a more desirable QoDS of the application.

0 Replies