Regulating Model Reliance on Non-Robust Features by Smoothing Marginal Density of Input

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeX
Keywords: model robustness, interpretability, feature attribution
TL;DR: A robust regularization is proposed to regulate model reliance on non-robust features by smoothing marginal density.
Abstract: Trustworthy machine learning necessitates meticulous regulation of model reliance on non-robust features. We propose a framework to delineate such features by attributing model predictions to the input. Within our framework, robust feature attributions exhibit a certain consistency, while non-robust feature attributions are susceptible to fluctuations. This feature behavior leads to the identification of correlation between model reliance on non-robust features and smoothness of marginal density of the input samples. Hence, we propose to regularize the gradients of the marginal density w.r.t. the input features for robustness. We also devise an efficient implementation of our regularization to address the potential numerical instability of the underlying optimization process. Moreover, we analytically reveal that, as opposed to our marginal density smoothing, the commonly adopted input gradient regularization smooths conditional or joint density of the input, resulting in limited robustness. Our experiments validate the effectiveness of the proposed method, providing clear evidence of mitigating spurious correlations learned by the model, and addressing the feature leakage problem. We demonstrate that our regularization enables the model to exhibit robustness against perturbations in pixel values, input gradients and density.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4409
Loading