Steady and Fair Robustness Evaluation Based on Model Interpretation

Soyoun Won; Hyeon Bae Kim; Yong Hyun Ahn; Hong Joo Lee; Seong Tae Kim

Steady and Fair Robustness Evaluation Based on Model Interpretation

Soyoun Won, Hyeon Bae Kim, Yong Hyun Ahn, Hong Joo Lee, Seong Tae Kim

23 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial robustness, robustness evaluation, Shapley value

Abstract: Adversarial robustness has become a major concern as machine learning models are increasingly deployed in security-sensitive applications. Evaluating adversarial robustness remains a challenging task, as current metrics are heavily affected by various factors, including attack methods, attack intensities, and model architecture. In this paper, we propose Steady and Fair Robustness Evaluation, a novel framework designed to mitigate the impact of these factors and provide a more stable evaluation of a model’s robustness. Our key insight is based on the strong correlation between the standard deviation (SD) of Shapley values, which measures the importance of individual neurons, and adversarial robustness. We demonstrate that models with lower SD of Shapley values are more robust to adversarial attacks, regardless of the attack method or model architecture. Extensive experiments across various models, training objectives, and attack scenarios show that our approach offers more consistent and interpretable robustness evaluation. We further introduce a new training strategy that incorporates the minimization of the SD of Shapley values for improving the robustness of the model. Our findings suggest that analysis based on Shapley value can provide a principled and efficient alternative to conventional robustness evaluation techniques.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2838

Loading