Abstract: Traditionally, evaluation of explanations falls into one of two camps: proxy metrics (an algorithmic evaluation based on desirable properties) or human user studies (an experiment with real users that puts explanations to the test in real use cases). For the purpose of determining suitable explanations for a desired real-world use case, the former is efficient to compute but disconnected from the use case itself. Meanwhile, the latter is time-consuming to organize and often difficult to get right. We argue for the inclusion of a new type of evaluation in the evaluation workflow that capitalizes on the strengths of both called Simulated User Evaluations, an algorithmic evaluation grounded in real use cases. We provide a two-phase framework to conduct Simulated User Evaluations and demonstrate that by instantiating this framework for local explanations we can use Simulated User Evaluations to recreate findings from existing user studies for two use cases (identifying data bugs and performing forward simulation). Additionally, we demonstrate the ability to use Simulated User Evaluations to provide insight into the design of new studies.
0 Replies
Loading