Characterizing Misclassifications of Deep NLP Models

Anonymous

12 Mar 2021 (modified: 12 Mar 2021)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Keywords: pattern mining, explainability, misclassification, MDL

Abstract: Understanding the reasons for misclassification is a critical point to improve a black-box classifier's performance. We propose a method to characterize these classification errors while considering the challenges that NLP applications pose, such as sparse and high dimensional discrete input spaces. Our approach discovers patterns over the input of a model that strongly correlate with the correctness of the classification. This allows identifying the systematic errors made by the models. We formalize the problem in terms of the Minimum Description Length principle to obtain non-redundant and easily interpretable results, and we propose the Premise algorithm to find good patterns in practice. The discovered patterns allow the user to take action and improve the model, e.g. through changes to the training data or model definition. On synthetic data and two real-world NLP tasks, we show that Premise performs well in practice. For two Visual Question Answering classifiers, we discover that they struggle with aspects like counting, location and reading, and for a Named Entity Recognition model, we leverage the found patterns to improve the F1 performance by almost 10% through targeted fine-tuning.

0 Replies