Keywords: Fairness, Privacy, Robustness, Explainability, Uncertainty Quantification, Trustworthy AI, Intersectionality
TL;DR: Various aspects of Trustworthy AI often negatively interact, hence efforts to build more aligned models must account for the intersectionality of multiple aspects.
Abstract: Trustworthy AI encompasses many aspirational aspects for aligning AI systems with human values, including fairness, privacy, robustness, explainability, and uncertainty quantification. However, efforts to enhance one aspect often introduce unintended trade-offs that negatively impact others, making it challenging to improve all aspects simultaneously. In this paper, we review notable approaches to five aspects and systematically consider every pair, detailing the negative interactions that can arise. For example, applying differential privacy to model training can amplify biases in the data, undermining fairness. Drawing on these findings, we take the position that addressing trustworthiness along each axis in isolation is insufficient. Instead, to achieve better alignment between humans and AI, efforts in Trustworthy AI must account for intersectionality between aspects and adopt a holistic view across all relevant axes at once. To illustrate our perspective, we provide guidance on how researchers can work towards integrated trustworthiness, and a case study on how intersectionality applies to the financial industry.
Submission Type: Long Paper (9 Pages)
Archival Option: This is a non-archival submission
Presentation Venue Preference: ICLR 2025
Submission Number: 19
Loading