Counterfactual Fairness from Partially DAGs: A General Min-Max Optimization Framework

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: causal reasoning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: fairness, counterfactual fairness, DAG, partially DAG
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: This paper proposes a general min-max optimization framework that can effectively achieve counterfactual fairness when the true causal graph is unknown or partially known.
Abstract: Developing fair automated machine learning algorithms is critical in making safe and trustworthy decisions. Many causality-based fairness notions have been proposed to address the above issues by quantifying the causal connections between sensitive attributes and decisions, and when the true causal graph is fully known, certain algorithms that achieve counterfactual fairness have been proposed. However, when the true causal graph is unknown, it is still challenging to effectively and well exploit partially directed acyclic graphs (PDAGs) to achieve counterfactual fairness. To tackle the above issue, a recent work suggests using non-descendants of sensitive attribute for fair prediction. Interestingly, in this paper, we show it is actually possible to achieve counterfactual fairness even using the descendants of the sensitive attribute for prediction, by carefully control the possible counterfactual effects of the sensitive attribute. We propose a general min-max optimization framework that can effectively achieve counterfactual fairness with promising prediction accuracy, and can be extended to maximally oriented PDAGs (MPDAGs) with added background knowledge. Specifically, we first estimate all possible counterfactual treatment effects of sensitive attribute on a given prediction model from all possible adjustment sets of sensitive attributes. Next, we propose to alternatively update the prediction model and the corresponding possible estimated causal effects, where the prediction model is trained via a min-max loss to control the worst-case fairness violations. Extensive experiments on synthetic and real-world datasets verifying the effectiveness of our methods.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8703
Loading