Keywords: knowledge distillation, object detection
Abstract: Multi-camera 3D object detection for autonomous driving is quite challenging and has drawn great attention from both academia and industry. The core issue of the vision-only methods is that it is difficult to mine accurate geometry-aware features from images. To improve the performance of vision-only approaches, one promising ingredient in the recipe lies in how to use visual features to simulate the geometry information of LiDAR, since point cloud data inherently carries 3D spatial information. In this paper, we resort to knowledge distillation to leverage useful representations from the LiADR-based expert to enhance feature learning in the camera-based pipeline. It is observed that the joint optimization of expert-apprentice distillation as well as the target task might be difficult to learn in the conventional distillation paradigm. Inspired by the great blossom and impressive results of foundation models in general vision, we propose a pretrained distillation paradigm, termed as PreDistill, to decouple the training procedure into two stages. The apprentice network first emphasizes the knowledge transfer from the expert; then it performs finetuning on the downstream target task. Such a strategy would facilitate the optimal representation learning with targeted goals and ease the joint feature learning as resided in conventional single-stage counterpart. PreDistill serves as a convenient plug-and-play that is flexible to extend to multiple state-of-the-art detectors. Without bells and whistles, building on top of the most recent approaches, e.g., BEVFusion-C, BEVFormer, and BEVDepth, we could guarantee a unanimous gain of 7.6%, 1.0%, and 0.6% in terms of NDS metric on nuScenes benchmark. Code and model checkpoints would be available.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
TL;DR: We propose PreDistill, a pretrained distillation paradigm for knowledge transfer and demonstrate that PreDistill serves as a plug-and-play module to various state-of-the-art detectors.
4 Replies
Loading