Task-Free Fairness-Aware Bias Mitigation for Black-Box Deployed Models

Guodong Cao, Zhibo Wang, Yunhe Feng, Xiaowei Dong, Zhifei Zhang, Zhan Qin, Kui Ren

Published: 01 Jan 2024, Last Modified: 19 Feb 2025IEEE Trans. Dependable Secur. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With AI systems widely deployed in societal applications, the fairness of these models is of increasing concern, for instance, hiring systems should recommend applicants impartially from different demographic groups, and risk assessment systems must eliminate racial inequity in the criminal justice system. Therefore, ensuring fairness in these models is crucial. In this paper, we propose Task-Free Fairness-Aware Adversarial Perturbation (TF-FAAP), a flexible approach for improving the fairness of black-box deployed models by adding perturbations on input samples that blind their fairness-related attribute information without modifying the model's parameters or structures. The proposed TF-FAAP consists of a discriminator and a generator to create universal fairness-aware perturbations for a variety of tasks. The former aims to distinguish fairnessrelated attributes, and the latter generates perturbations to make the discriminator's prediction distribution of fairness-related attributes uniform. To preserve the utility of perturbed samples, we maximize the mutual information between their representations and corresponding original samples, retaining more original samples' information. In addition, the perturbation generated by TF-FAAP has a high transferability, i.e., the perturbations learned on one dataset can also alleviate the unfairness of a model trained on a different dataset. The extensive experimental evaluation demonstrated the effectiveness and superior performance of our method.