REATO: Robust Ensemble Autoencoders for Textual OutliersDownload PDF

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone
Abstract: Outlier detection is a recurring challenge in machine learning, actively researched across various domains including computer vision, time series analysis, and high-dimensional data. Recently, the interest in textual outlier detection and textual anomaly detection has blossomed, bringing forth unique challenges. Unfortunately, existing approaches often overlook a critical consideration: the specific type of textual outlier they aim to detect. We found that the experimental protocol of the literature does not identify different kind of textual outliers. To solve this issue, we present a novel approach of textual outlier detection using robust ensemble autoencoders that succeed to retrieve difficult anomalies. To enhance the robustness of our autoencoders, we introduce a novel robust subspace recovery loss function that takes into account the locality in the latent space. Our ensemble learning strategy involves randomly connected autoencoders. Additionnaly, we address the issue of limited corpus availability by preparing two types of outliers: independent and contextual. An intriguing aspect of our work is the distinction between these two outlier types, which we formalize and demonstrate to be fundamentally different to handle within a corpus. Notably, our approach not only delivers competitive results when compared to existing methods but also excels in handling contextual outliers.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment, Position papers
Languages Studied: english
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies

Loading