Integrated Model Explanations by Independent and Collaborative Feature Influence via Linear-Nonlinear Perspectives.

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Explanation method, Linear simplification, Feature interactions, Independent influence, Collaborative influence, Linear-Nonlinear Explanation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: In machine learning, model-agnostic explanation methods try to give explanation to model prediction by assessing the importance of input features. While linear simplification methods guarantee good properties, they have to include nonlinear feature interactions into linear coefficients. On the other hand, feature influence analysis methods examine feature relevance, but do not consistently preserve the desirable properties for robust explanations. Our approach seeks to inherit properties from linear simplification methods while systematically capturing feature interactions. To achieve this, we consider the explained model from two aspects: the linear aspect, which focuses on the independent influence of features to model predictions, and the nonlinear aspect, which concentrates on modeling feature interactions and their collaborative impact on model predictions. In practice, our method initially investigates both the linear and nonlinear aspects of the model being explained. It then extracts the independent and collaborative importance of features on model predictions and consistently combines them to ensure that the resulting feature importance preserves the desirable properties for robust explanations. Consequently, our Linear-Nonlinear Explanation (LNE) method provides a comprehensive understanding on how features influence model predictions. To validate its effectiveness, experiments demonstrate that linear, nonlinear, and the combined feature importance all offer valuable insights for explaining model predictions. We also compare the performance of LNE with other methods on explaining well-trained classifiers, and find our explanations align more closely with human intuitions. Additionally, user study shows our method can hint humans with potential biases in classifiers.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9072
Loading