Keywords: Semi-supervised Representation Learning, 3D Object Detection
Abstract: DETR-based 3D detectors have recently emerged as a popular alternative to voting- and voxel-based methods, which offer end-to-end set prediction without handcrafted priors or voxelization. However, they remain unexplored under semi-supervision, where the scarcity of annotated 3D data impedes their widespread adoption. In this work, we present Semi-3DETR, the first framework to systematically adapt DETR to semi-supervised 3D object detection by addressing challenges unique to 3D. Compared to 2D semi-DETR, semi-supervised 3D DETR faces amplified issues of fragile volumetric pseudo-labels, unstable query alignment, and noisy bipartite matching. Our Semi-3DETR mitigates these issues by introducing three core components: Robust Pseudo-Label Denoising (RPLD) to filter and refine volumetric pseudo-labels against orientation and depth errors, Query Alignment Consistency (QAC) to stabilize teacher–student query correspondence under 3D transformations, and a Hybrid Matching Strategy (HMS) to balance one-to-one and one-to-many assignments under noisy supervision. We further adopt a softmax classifier to enforce class exclusivity and improve pseudo-label reliability in semantically ambiguous 3D categories. Extensive experiments on ScanNet and SUN RGB-D demonstrate the feasibility of our Semi-3DETR with promising results compared to fully supervised and semi-supervised baselines. The source code will be released upon paper acceptance.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 7161
Loading