A Lightweight yet Robust Approach to Textual Anomaly Detection

Anonymous

A Lightweight yet Robust Approach to Textual Anomaly Detection

Anonymous

05 Jun 2022 (modified: 05 May 2023)ACL ARR 2022 June Blind SubmissionReaders: Everyone

Abstract: Highly imbalanced textual datasets continue to pose a challenge for supervised learning models, especially when the minority class is multi-topical. Viewing such imbalanced text data as an anomaly detection (AD) problem how-ever has advantages for certain tasks such as detecting hate speech, or inappropriate and/or offensive language in large social media feeds. There the unwanted content tends to be both rare and non-uniform with respect to its thematic character, and better fits the definition of an anomaly than a class. Several recent approaches to textual AD use transformer models, achieving good results but with trade-offs in pre-training and inflexibility to new domains. In this paper we compare two linear models within the NMF family, which also have a recent history in textual AD. We introduce a new approach based on an alternative regularization of the NMF objective. Our results surpass other linear AD models and are on par with deep models, performing comparably well even in small concentrations.

Paper Type: short

0 Replies

Loading