Abstract: Interpreting and debugging machine learning models is necessary to ensure the robustness of the machine learning models. Explaining mispredictions can help significantly in doing so. While recent works on misprediction explanation have proven promising in generating interpretable explanations for mispredictions, the state-of-the-art techniques “blindly” deduce misprediction explanation rules from all data features, which may not be scalable depending on the number of features. To alleviate this problem, we propose an efficient misprediction explanation technique named Bias Guided Misprediction Diagnoser (BGMD), which leverages two prior knowledge about data: a) data often exhibit highly-skewed feature distributions and b) trained models in many cases perform poorly on subdataset with under-represented features. Next, we propose a technique named MAPS (Mispredicted Area UPweight Sampling). MAPS increases the weights of subdataset during model retraining that belong to the group that is prone to be mispredicted because of containing under-represented features. Thus, MAPS make retrained model pay more attention to the under-represented features. Our empirical study shows that our proposed BGMD outperformed the state-of-the-art misprediction diagnoser and reduces diagnosis time by 92%. Furthermore, MAPS outperformed two state-of-the-art techniques on fixing the machine learning model's performance on mispredicted data without compromising performance on all data. All the research artifacts (i.e., tools, scripts, and data) of this study are available in the accompanying website [1].
0 Replies
Loading