Object-level Data Augmentation for Visual 3D Object Detection in Autonomous Driving

Haoran Cheng; Junkai Xu; Liang Peng; Qi Zhou; Linxuan Xia; Boxi Wu; Xiaofei He

Object-level Data Augmentation for Visual 3D Object Detection in Autonomous Driving

Haoran Cheng, Junkai Xu, Liang Peng, Qi Zhou, Linxuan Xia, Boxi Wu, Xiaofei He

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Data augmentation, 3D object detection, autonomous driving

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Data augmentation plays an important role in visual-based 3D object detection. Existing detectors typically employ image/BEV-level data augmentation techniques, failing to utilize flexible object-level augmentations because of 2D-3D inconsistencies. This limitation hinders us from increasing the diversity of training data. To alleviate this issue, we propose an object-level data augmentation approach that incorporates scene reconstruction and neural scene rendering. Specifically, we reconstruct the scene and objects by extracting image features from sequences and aligning them with associated LiDAR point clouds. This approach is intended to conduct the editing process within a 3D space, allowing for flexible object manipulation. Additionally, we introduce a neural scene renderer to project the edited 3D scene onto a specified camera plane and render it onto a 2D image. Combining with the scene reconstruction, it overcomes the challenges stemming from 2D/3D inconsistencies, enabling the generation of object-level augmented images with corresponding labels for model training. To validate the proposed method, we apply our method to two popular multi-camera detectors: PETRv2 and BEVFormer, consistently boosting the performance. Codes will be public.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7495

Loading