Dynamic Learnable Label Assignment for Indoor 3D Object Detection

Published: 2025, Last Modified: 07 Jan 2026IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we present a dynamic learnable label assignment (DLLA) method for indoor anchor-free one-stage 3D object detection. Existing methods principally depend on hand-crafted strategies with fixed thresholds, which fail to adapt to the inherent variability in object characteristics such as size, shape, and occlusion levels. This lack of adaptability results in suboptimal sample assignments and unstable detection performance. To address this challenge, we map the features of proposals and ground truths separately into the same embedding space, enabling a dynamic strategy of assigning appropriate positive samples to each instance. Specifically, we first interact with the features of all proposals to effectively integrate information from each proposal in the scene and capture long-range dependencies between different locations. Additionally, to extract more discriminative and generalized features for positive and negative samples, we employ a contrastive learning process to optimize the elemental relationships and distances between proposals and ground truths. Finally, we introduce a denoising task to alleviate the difficulty of the unsupervised learning process in DLLA. Experimental results show that our DLLA outperforms other methods on three popular indoor datasets (ScanNet V2, SUN RGB-D, and ScanNet200).
Loading