UniL: Point Cloud Novelty Detection through Multimodal Pre-training

Yuhan Wang, Mofei Song

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: 3D novelty detection plays a crucial role in various real-world applications, especially in safety-critical fields such as autonomous driving and intelligent surveillance systems. However, existing 3D novelty detection methods are constrained by the scarcity of 3D data, which may impede the model's ability to learn adequate representations, thereby impacting detection accuracy. To address this challenge, we propose a Unified Learning Framework (UniL) for facilitating novelty detection. During the pretraining phase, UniL assists the point cloud encoder in learning information from other modalities, aligning visual, textual, and 3D features within the same feature space. Additionally, we introduce a novel Multimodal Supervised Contrastive Loss (MSC Loss) to improve the model's ability to cluster samples from the same category in feature space by leveraging label information during pretraining. Furthermore, we propose a straightforward yet powerful scoring method, Depth Map Error (DME), which assesses the discrepancy between projected depth maps before and after point cloud reconstruction during novelty detection. Extensive experiments conducted on 3DOS have demonstrated the effectiveness of our approach, significantly enhancing the performance of the unsupervised VAE method in 3D novelty detection. Codes are avaliable at https://github.com/EugeneWon9/UniL.