Keywords: Explanations, Properties, Optimization, Feature Attribtution Explanations
Abstract: Recent work has introduced a framework that allows users to directly optimize explanations for desired properties
and their trade-offs. While powerful in principle, this method repurposes evaluation metrics as loss functions – an
approach whose implications are not yet well understood. In this paper, we study how different robustness metrics
influence the outcome of explanation optimization, holding faithfulness constant. We do this in the transductive
setting, in which all points are available in advance. Contrary to our expectations, we observe that the choice of
robustness metric can lead to highly divergent explanations, particularly in higher-dimensional settings. We trace
this behavior to the use of metrics that evaluate the explanation set as a whole, rather than imposing constraints on
individual points, and to how these “global” metrics interact with other optimization objectives. These interactions
can allow the optimizer to produce locally inconsistent, unintuitive, and even undesirable explanations, despite
satisfying the desired trade-offs. Our findings highlight the need for metrics whose mathematical structure more
closely aligns with their intended use in optimization, and we advocate for future work that rigorously investigates
metrics that incorporate a pointwise evaluation and their influence on the optimization landscape.
Code: ipynb
Submission Number: 63
Loading