Keywords: 3d Object Detection, Model Pruning, Model Compression
Abstract: Recent advancements in 3D deep learning have garnered significant attention, given their superior performance in fields like AR/VR, autonomous driving, and robotics.
However, as the models and point cloud data continues to scale up, managing computational and memory demands becomes a critical challenge, particularly for real-world applications with strict latency and energy requirements.
Previous methods have primarily focused on reducing computational costs and memory usage by addressing spatial redundancy, \textit{i.e.}, filtering out irrelevant points or voxels. In contrast, this work presents a novel post-training weight pruning technique tailored specifically for 3D object detection.
Our approach stands out in two key ways: (1) it operates independently from existing point cloud sparsification methods, targeting redundant parameters in pre-trained models that minimally affect both spatial accuracy and detection confidence (collectively referred to as "detection distortion"), and (2) it provides a flexible, plug-and-play framework compatible with other sparsity schemes including spatial sparsity and with any 3D detection model.
Our method reduces detection distortion by employing a second-order Taylor approximation to identify layer-wise sparsity, allowing for a substantial reduction in model complexity without sacrificing detection accuracy.
To efficiently manage the necessary second-order information, we devised a lightweight algorithm to gather Hessian information, followed by dynamic programming to optimize layer-wise sparsity allocation.
Extensive experiments on the KITTI, nuScenes, and ONCE datasets validate the effectiveness of our approach, where we not only preserve detection performance but also notice enhancement while significantly reducing computational overhead.
Noticeably, we achieve FLOPs reductions for Centerpoint model of as much as $\mathbf{3.89}\times$ and $\mathbf{3.01}\times$ on ONCE and nuScenes datasets respectively, without noticeable loss in mean Average Precision (mAP), and at most $\mathbf{1.65}\times$ reduction \textbf{losslessly} for PVRCNN model on the ONCE dataset, thus pushing the boundaries of state-of-the-art performance.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4132
Loading