Evaluating Disparities in the Quality of Post hoc Explanations when the Explained Blackboxes are subjected to Fairness Contraints

TMLR Paper5299 Authors

04 Jul 2025 (modified: 16 Jul 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In recent years, the application of machine learning models in critical domains has raised significant concerns regarding the fairness and interpretability of their predictions. This study investigates the disparities in the quality of post-hoc explanations generated for com- plex black-box models, specifically focusing on the influence of fairness constraints on these explanations across diverse demographic groups. Utilizing datasets from ACSIncome, AC- SEmployment, and COMPAS, we employ explanation methods such as LIME and Ker- nelSHAP to evaluate metrics including Maximum Fidelity Gap from Average (MFGA), Consistency and Stability. Our findings reveal that the imposition of fairness constraints impacts the fidelity and consistency of explanations, with notable variations observed be- tween demographic groups. While some datasets demonstrate equitable explanation quality across genders, significant biases persist in others, particularly affecting younger individuals and racial minorities. The research highlights the necessity for robust fairness-preserving techniques in post-hoc explanations and underscores the critical need for transparency in AI-driven decision-making processes. By correlating model unfairness with disparities in explanation quality, this work aims to contribute to the ongoing discourse on ethical AI, emphasizing the importance of both accuracy and fairness in machine learning applications.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=xP6uXelckf&noteId=xP6uXelckf
Changes Since Last Submission: Anonymized the github link
Assigned Action Editor: ~Sivan_Sabato1
Submission Number: 5299
Loading