Evaluating Disparities in the Quality of Post hoc Explanations when the Explained Blackboxes are subjected to Fairness Contraints
Abstract: In recent years, the application of machine learning models in critical domains has raised
significant concerns regarding the fairness and interpretability of their predictions. This
study investigates the disparities in the quality of post-hoc explanations generated for com-
plex black-box models, specifically focusing on the influence of fairness constraints on these
explanations across diverse demographic groups. Utilizing datasets from ACSIncome, AC-
SEmployment, and COMPAS, we employ explanation methods such as LIME and Ker-
nelSHAP to evaluate metrics including Maximum Fidelity Gap from Average (MFGA),
Consistency and Stability. Our findings reveal that the imposition of fairness constraints
impacts the fidelity and consistency of explanations, with notable variations observed be-
tween demographic groups. While some datasets demonstrate equitable explanation quality
across genders, significant biases persist in others, particularly affecting younger individuals
and racial minorities. The research highlights the necessity for robust fairness-preserving
techniques in post-hoc explanations and underscores the critical need for transparency in
AI-driven decision-making processes. By correlating model unfairness with disparities in
explanation quality, this work aims to contribute to the ongoing discourse on ethical AI,
emphasizing the importance of both accuracy and fairness in machine learning applications.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=xP6uXelckf¬eId=xP6uXelckf
Changes Since Last Submission: Anonymized the github link
Assigned Action Editor: ~Sivan_Sabato1
Submission Number: 5299
Loading