Abstract: In today’s data-driven world, addressing bias is essential to minimize discriminatory outcomes and work toward fairness in machine learning models. This paper presents a novel data-centric framework for bias analysis, harnessing the power of counterfactual reasoning. We detail a process for generating plausible counterfactuals suited for group evaluation, using probabilistic distributions and optionally incorporating domain knowledge, as a more efficient alternative to computationally intensive generative models.
Additionally, we introduce the Counterfactual Confusion Matrix, from which we derive a suite of metrics that provide a comprehensive view of a model's behaviour under counterfactual conditions. These metrics offer unique insights into the model's resilience and susceptibility to changes in sensitive attributes, such as sex or race. We demonstrate their utility and complementarity with standard group fairness metrics through experiments on real-world datasets. Our results show that domain knowledge is key, and that our metrics can reveal subtle biases that traditional bias evaluation strategies may overlook, providing a more nuanced understanding of potential model bias.
Keywords: Bias, Fairness, Counterfactual, Confusion Matrix, Data Augmentation, Machine Learning
Changes Since Last Submission: This is the camera-ready version. We made minor changes to improve readability and clarity.
Changes Since Previous Publication: N/A
Assigned Action Editor: ~Yang_Liu3
Submission Number: 19
Loading