Optimizing Explanations: Nuances Matter When Evaluation Metrics Become Loss Functions

Published: 10 Jun 2025, Last Modified: 17 Jul 2025MOSS@ICML2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Explanations, Properties, Optimization, Feature Attribtution Explanations
Abstract: Recent work has introduced a framework that allows users to directly optimize explanations for desired properties and their trade-offs. While powerful in principle, this method repurposes evaluation metrics as loss functions – an approach whose implications are not yet well understood. In this paper, we study how different robustness metrics influence the outcome of explanation optimization, holding faithfulness constant. We do this in the transductive setting, in which all points are available in advance. Contrary to our expectations, we observe that the choice of robustness metric can lead to highly divergent explanations, particularly in higher-dimensional settings. We trace this behavior to the use of metrics that evaluate the explanation set as a whole, rather than imposing constraints on individual points, and to how these “global” metrics interact with other optimization objectives. These interactions can allow the optimizer to produce locally inconsistent, unintuitive, and even undesirable explanations, despite satisfying the desired trade-offs. Our findings highlight the need for metrics whose mathematical structure more closely aligns with their intended use in optimization, and we advocate for future work that rigorously investigates metrics that incorporate a pointwise evaluation and their influence on the optimization landscape.
Code: ipynb
Submission Number: 63
Loading