Abstract: In recent years, the 3D object detection method has undergone rapid evolution, heavily relying on substantial amounts of high-quality labeled data. However, the process of annotating 3D data is both time-consuming and costly. In response to this challenge, we propose a vote-based semi-supervised 3D object detection framework called VS3D. First, a data augmentation technique named Random Grid Deleting (RGD) is proposed to detect occluded objects and small objects more robustly. Then, an auxiliary branch with Voting Consistency Learning (VCL) is added to predict object centers more accurately. Additionally, a Teacher-Student Matching (TSM) module with stricter consistency constraints is designed to accelerate network convergence and improve detection performance. Our method can integrate any vote-based fully supervised network seamlessly. Extensive experiments on SUN RGB-D and ScanNet V2 datasets demonstrate that the proposed method outperforms the state-of-the-art fully supervised model when using only 70% labeled data.
Loading