Abstract: We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA). Motivated by the performance of Randomized PCA (RPCA) Forest in approximate K-Nearest Neighbor (KNN) search, we develop a novel unsupervised outlier detection method that utilizes RPCA Forest for unsupervised outlier detection by deriving an outlier score from its intrinsic properties. Experimental results showcase the superiority of the proposed approach compared to the classical and state-of-the-art methods in performing the outlier detection task on several datasets while performing competitively on the rest. The extensive analysis of the proposed method reflects its robustness and its computational efficiency, highlighting it as a good choice for unsupervised outlier detection.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: In response to the constructive feedback from the reviewers, the following major updates and improvements have been made to the manuscript:
1. New Sections & Analysis
Ablation Study (Section 5.5): Added a comprehensive ablation study across all datasets to justify the design choices for the outlier score, including statistical analysis.
Theoretical Framework (Section 4): Enhanced the theoretical discussion regarding the Randomized PCA tree construction and expectations of the method.
Limitations (Section 6): Added a dedicated discussion on the specific conditions under which the Randomized PCA (RPCA) forest may fail and the inherent limitations of the approach.
2. Methodological Clarifications
Randomized PCA (Section 3 & 4): Included a full theoretical overview of Randomized PCA and a more granular description of how the Randomized PCA trees are fitted.
Computational Complexity (Section 4.3): Expanded the complexity analysis with detailed breakdowns of the fitting and scoring processes.
3. Structural & Stylistic Improvements
Related Work: Restructured and condensed the related work section to improve readability and flow.
Refined Claims: Revised the manuscript’s language to be more moderate and precise, ensuring that the performance claims accurately reflect the competitive nature of the results rather than overstating dominance.
Error Analysis: Provided a detailed analysis of four specific failure cases to offer deeper insight into the algorithm's behavior in edge cases.
Assigned Action Editor: ~Liang-Chieh_Chen1
Submission Number: 7076
Loading