Abstract: Identifying anomalous documents in a text corpus is an important problem that has wide applications. Due to the high dimensional and sparse nature of text data, traditional outlier detection methods fail to identify features that distinguish outliers. Inspired by the capability of Nonnegative Matrix Factorization (NMF) for text clustering, we explore it for text outlier detection. In this paper, a novel NMF-based method called Nonnegative Orthogonal Constraint Outlier Learning (NOCOL) is introduced that learns the outliers effectively during the factorization process. Experimental results show the higher accuracy of NOCOL in identifying text outliers in comparison to the state-of-the-art methods.
External IDs:dblp:conf/wise/Balasubramaniam21
Loading