DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection

Published: 20 Jul 2024, Last Modified: 02 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: LiDAR-based 3D object detection has seen impressive advances in recent times. However, deploying trained 3D detectors in the real world often yields unsatisfactory performance when the distribution of the test data significantly deviates from the training data due to different weather conditions, object sizes, etc. A key factor in this performance degradation is the diminished generalizability of pre-trained models, which creates a sharp loss landscape during training. Such sharpness, when encountered during testing, can precipitate significant performance declines, even with minor data variations. To address the aforementioned challenges, we propose dual-perturbation optimization (DPO) for Test-time Adaptation in 3D Object Detection (TTA-3OD). We minimize the sharpness to cultivate a flat loss landscape to ensure model resiliency to minor data variations, thereby enhancing the generalization of the adaptation process. To fully capture the inherent variability of the test point clouds, we further introduce adversarial perturbation to the input BEV features to better simulate the noisy test environment. As the dual perturbation strategy relies on trustworthy supervision signals, we utilize a reliable Hungarian matcher to filter out pseudo-labels sensitive to perturbations. Additionally, we introduce early Hungarian cutoff to avoid error accumulation from incorrect pseudo-labels by halting the adaptation process. Extensive experiments across three types of transfer tasks demonstrate that the proposed DPO significantly surpasses previous state-of-the-art approaches, specifically on Waymo $\rightarrow$ KITTI, outperforming the most competitive baseline by 57.72\% in $\text{AP}_\text{3D}$ and reaching 91% of the fully supervised upper bound. Our code is available in the supplementary materials.
Primary Subject Area: [Content] Media Interpretation
Relevance To Conference: LiDAR-based 3D point clouds mark a significant evolution in multimedia data, crucial for applications like autonomous driving and robotic vision. Recent advancements commonly model the point clouds into representations of various modality such as point set (e.g., FARP-Net and SP-Det in TMM), voxels (e.g., SECOND in Sensors, SMF-SSD and SPNet in ACM MM) and hybrid modeling of both (e.g. PVRCNN in CVPR and FromVoxelToPoint in ACM MM). When deployed to the wild, aforementioned pretrained 3d detectors (e.g., SECOND and PVRCNN) can fail due to factors like adverse weather or sensor malfunctions. This work designs a universal algorithm to dynamically adapt any detection model with various modality to novel environments during inference. Specifically, a dual-perturbation is proposed to corrupt the input with various modality for model robustness and generalizability. Besides, the proposed early-stopping mechanism uses temporal constraint for efficient adaption by taking into an additional modality (i.e., time series).
Supplementary Material: zip
Submission Number: 234
Loading