The Devil is in the Quality: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection

Published: 2025, Last Modified: 12 Nov 2025ICRA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper tackles the challenging problem of semi-supervised monocular 3D object detection with a general framework. In specific, having observed that the bottleneck of this task lies in lacking reliable and informative samples from unlabeled data for detector learning, we introduce a novel simple yet effective ‘Augment and Criticize’ pipeline that mines abundant informative samples for robust detection. To be more specific, in the ‘Augment’ stage, we present the Augmentation-based Prediction aGgregation (APG), which applies automatically learned transformations to unlabeled images and aggregates detections from various augmented views as pseudo labels. Since not all the pseudo labels from APG are beneficially informative, the subsequent ‘Criticize’ phase is introduced. Particularly, we present the Critical Retraining Strategy (CRS) that, unlike simply filtering pseudo labels using a fixed threshold, employs a learnable network to evaluate the contribution of unlabeled images at different training timestamps. This way, the noisy samples prohibitive to model evolution can be effectively suppressed. In order to validate ‘Augment-Criticize’, we apply it to MonoDLE [1] and MonoFlex [2], and the two new detectors, dubbed 3DSeMoDLE and 3DSeMoFLEX, achieve state-of-the-art results with consistent improvements, evidencing its effectiveness and generality.
Loading