Abstract: Panoptic driving perception is a crucial component of the online evolutive learning system. Existing research is nearly based on convolutional neural networks and Transformer networks. These networks are constrained by the limited receptive fields of convolution computational and quadratic computational complexity of the transformer, which hinders further performance improvement. The recently proposed Mamba exhibits long-distance modeling while maintaining linear computational complexity. This paper proposes PDPMamba, a multi-task visual Mamba network for Panoptic Driving Perception (PDP). This is the first pioneering work to employ visual Mamba in panoptic driving perception. PDPMamba employs a framework with an encoder shared by three decoders. Multi-scale visual Mamba is adopted as the core of the encoder to extract multi-scale feature information. The first decoder is for traffic object detection, which fuses multi-scale feature information from the encoder. The second decoder is for drivable area segmentation, which receives the biggest scale feature map information of multi-scale feature maps from the encoder. The last decoder is for lane detection, which receives the medium-scale feature map information of multi-scale feature maps from the encoder. In this paper, we harness the capabilities of Mamba to extract multi-scale feature maps for panoptic driving perception tasks and achieve commendable performance. As a preliminary exploratory work, PDPMamba achieves a superior to other networks on pixel accuracy metrics for lane detection, while also achieving the top two on the metrics for traffic object detection and drivable area segmentation in the experiments based on both BDD1OOK dataset and real road scene. Results exhibit PDPMamba's great robustness and generalization under diverse road conditions. Our successful attempt will advance more applications of visual Mamba on autonomous driving.
External IDs:dblp:conf/icdsp/LiBSZLZ25
Loading