Abstract: As automatic decision-making systems advance and are deployed in many high-stake areas, ensuring fairness is becoming crucial. Although many fairness-aware algorithms have been proposed, most of them assume sufficient sensitive labels are available, which is not valid in some practical scenarios. To this end, we focus on a practical setting where the sensitive labels are incomplete and propose a Confidence-Based Randomization (CBR) framework. Our CBR can be integrated with existing fairness-aware learning algorithms, enabling them to achieve enhanced fairness by effectively utilizing samples without sensitive labels. Specifically, we employ a sensitive estimator to predict missing sensitive labels and randomize the low-confidence predictions (confidence below a threshold). Furthermore, to find the optimal thresholds for facilitating fair training, we propose a theory-based method by minimizing an upper bound of the estimation error for the fairness metric. Theoretically, we demonstrate that CBR outperforms the vanilla method that fully trusts the estimator’s predictions without randomization. Extensive experiments on real-world datasets affirm CBR’s effectiveness in promoting fairness and maintaining high accuracy with incomplete labels.
Loading