Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems
Keywords: Responsible AI, Evaluation, Generalization, Robustness, Pluralistic Alignment
TL;DR: Theoretical inconsistencies among Responsible AI metrics aren't a problem but a benefit – they enable both pluralistic approaches to alignment that respect diverse values, help conceptual understanding, and create more robust, adaptable models.
Abstract: This position paper argues that the theoretical inconsistency often observed among Responsible AI (RAI) metrics, such as differing fairness definitions or trade-offs between accuracy and privacy, should be embraced as a valuable feature rather than a flaw to be eliminated. We contend that navigating these inconsistencies, by treating metrics as divergent objectives, yields three key benefits: (1) Normative Pluralism: maintaining a full suite of potentially contradictory metrics ensures that the diverse moral stances and stakeholder values inherent in RAI are adequately represented; (2) Epistemological Completeness: using multiple, sometimes conflicting, metrics captures multifaceted ethical concepts more fully and preserves greater informational fidelity than any single, simplified definition; (3) Implicit Regularization: jointly optimizing for theoretically conflicting objectives discourages overfitting to any one metric, steering models toward solutions with better generalization and robustness under real-world complexities. In contrast, enforcing theoretical consistency by simplifying or pruning metrics risks narrowing value diversity, losing conceptual depth, and degrading model performance. We therefore advocate a shift in RAI theory and practice: from getting trapped by metric inconsistencies to establishing practice-focused theories, documenting the normative provenance and inconsistency levels of inconsistent metrics, and elucidating the mechanisms that permit robust, approximated consistency in practice.
Lay Summary: This paper argues that many Responsible AI scores like fairness, privacy, robustness, and accuracy inevitably pull in different directions, and that keeping several partly conflicting metrics is a strength, not a flaw. Using theory and examples, the authors show why no single model can satisfy all common definitions at once, and why a portfolio of metrics better reflects diverse values and often improves real-world reliability. They propose a practical recipe: set hard minimums for safety-critical metrics, define small tolerance bands for the rest, document assumptions and tradeoffs, and use low-overhead techniques like data reweighting, post-hoc thresholding, or light multi-objective training to hit a balanced profile. The goal is plural, transparent evaluation that acknowledges tensions while remaining workable for auditors, builders, and policymakers.
Submission Number: 279
Loading