Abstract: Two-step approaches combining pre-trained large language model embeddings and anomaly detectors show good performance in text anomaly detection by leveraging rich semantic representations. However, high-dimensional dense embeddings extracted by large language models create challenges in substantial memory requirements and high computation time. To address this challenge, we introduce the Simplified Isolation Kernel (SIK), which maps high-dimensional dense embeddings to lower-dimensional sparse representations while preserving crucial anomaly characteristics. SIK has linear-time complexity and significantly reduces space complexity through its innovative boundary-focused feature mapping.
Experiments across 7 datasets demonstrate that SIK achieves better detection performance than 11 SOTA anomaly detection algorithms while maintaining computational efficiency and low memory cost. All code and demonstrations are available at https://anonymous.4open.science/r/SIK-6577/.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: Text Anomaly Detection
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 2793
Loading