Abstract: Machine learning models that offer excellent predictive performance often lack the interpretability necessary to support integrated human machine decision-making. In clinical medicine and other high-risk settings, domain experts may be unwilling to trust model predictions without explanations. Work in explainable AI must balance competing objectives along two different axes: 1) Models should ideally be both accurate and simple. 2) Explanations must balance faithfulness to the model's decision-making with their plausibility to a domain expert.We propose to train a proxy model that mimics the behavior of a trained model and provides control over these trade-offs. We evaluate our approach on the task of assigning ICD codes to clinical notes to demonstrate that the proxy model is faithful to the trained model's behavior and produces quality explanations.
Paper Type: long
0 Replies
Loading