OOD detection on text classification

Aurelie NGALULA NGASSAM, ANDRES FERNANDO GARCIA PARRADO

19 Mar 2023 (modified: 19 Mar 2023)OpenReview Archive Direct UploadReaders: Everyone

Abstract: In natural language processing (NLP) tasks, it is crucial to detect whether a given input is out-of-distribution (OOD), meaning it falls outside the model’s training data. This is necessary because when the model encounters new text that it has not been trained on, it may not perform well and make incorrect predictions. The issue is particularly significant in scenarios where the model’s predictions have real-world consequences, such as in medical diagnosis or financial fraud detection. To address this problem, researchers have developed a baseline of three basic approaches for detecting OOD samples, which we have presented in our project available on GitHub1. We have evaluated the performance of these approaches in binary sentiment classification. Although they are effective in identifying OOD samples, there is still room for improvement in OOD detection methods, especially in correctly identifying true positive OOD samples. These techniques can serve as a starting point for practicioners who aim to apply OOD detection in NLP applications.

0 Replies