VTruST : Controllable value function based subset selection for Data-Centric Trustworthy AI

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Data centric trustworthy AI, value function, data valuation, online sparse approximation, fairness, robustness, explainability
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Trustworthy AI is crucial to the widespread adoption of AI in high-stakes applications with explainability, fairness, and robustness being some of the key trustworthiness metrics. Data-Centric AI (DCAI) aims to construct high-quality datasets for efficient training of trustworthy models. In this work, we propose a controllable framework for data-centric trustworthy AI (DCTAI)- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets. A key challenge in implementing an efficient DCTAI framework is to design an online value-function-based training data subset selection algorithm. We pose the training data valuation and subset selection problem as an online sparse approximation formulation, where the $\textit{features}$ for each training datapoint is obtained in an online manner through an iterative training algorithm. We propose a novel online version of the OMP algorithm for solving this problem. We also derive conditions on the data matrix, that guarantee the exact recovery of the sparse solution. We demonstrate the generality and effectiveness of our approach by designing data-driven value functions for the above trustworthiness metrics. Experimental results show that VTruST outperforms the state-of-the-art baselines for fair learning as well as robust training, on standard fair and robust datasets. We also demonstrate that VTruST can provide effective tradeoffs between different trustworthiness metrics through pareto optimal fronts. Finally, we show that the data valuation generated by VTruST can provide effective data-centric explanations for different trustworthiness metrics.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2042
Loading