NLP-ADBench: NLP Anomaly Detection Benchmark

NLP-ADBench: NLP Anomaly Detection Benchmark

ACL ARR 2025 May Submission1663 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Anomaly detection (AD) is an important machine learning task with applications in fraud detection, content moderation, and user behavior analysis. However, AD is relatively understudied in a natural language processing (NLP) context, limiting its effectiveness in detecting harmful content, phishing attempts, and spam reviews. We introduce NLP-ADBench, the most comprehensive NLP anomaly detection (NLP-AD) benchmark to date, which includes eight curated datasets and 19 state-of-the-art algorithms. These span 3 end-to-end methods and 16 two-step approaches that adapt classical, non-AD methods to language embeddings from BERT and OpenAI. Our empirical results show that no single model dominates across all datasets, indicating a need for automated model selection. Moreover, two-step methods with transformer-based embeddings consistently outperform specialized end-to-end approaches, with OpenAI embeddings outperforming those of BERT. We release NLP-ADBench at https://anonymous.4open.science/r/NLP-ADBench-E84C, providing a unified framework for NLP-AD and supporting future investigations.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, corpus creation, NLP datasets, evaluation methodologies, reproducibility, representation learning, word embeddings

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 1663

Loading