Unleashing the potential of mamba: Boosting a lidar 3d sparse detector by using cross-model knowledge distillation
Abstract: The LiDAR-based 3D object detector that strikes
a balance between accuracy and speed is crucial for achieving real-time perception in autonomous driving and robotic
navigation systems. To enhance the accuracy of point cloud
detection, integrating global context for visual understanding
improves the point cloud’s ability to grasp overall spatial
information. However, many existing LiDAR detection models
depend on intricate feature transformation and extraction
processes, leading to poor real-time performance and high
resource consumption, which limits their practical effectiveness.
In this work, we propose a Faster LiDAR 3D object detection
framework, called FASD, which implements heterogeneous
model distillation by adaptively uniform cross-model voxel
features. We aim to distill the transformer’s capacity for
high-performance sequence modeling into Mamba models with
low FLOPs, achieving a significant improvement in accuracy
through knowledge transfer. Specifically, Dynamic Voxel Group
and Adaptive Attention strategies are integrated into the sparse
backbone, creating a robust teacher model with scale-adaptive
attention for effective global visual context modeling. Following
feature alignment with the Adapter, we transfer knowledge
from the Transformer to the Mamba through latent space
feature supervision and span-head distillation, resulting in
improved performance and an efficient student model. We
evaluated the framework on the Waymo and nuScenes datasets,
achieving a 4x reduction in resource consumption and a 1-2%
performance improvement over the current SoTA methods.
Loading