Abstract: Detecting 3D lane lines from monocular images is garnering increasing attention in the Autonomous Driving (AD) area due to its
cost-effective edge. However, current monocular image models capture road scenes lacking 3D spatial awareness, which is error-prone
to adverse circumstance changes. In this work, we design a novel cross-modal knowledge transfer scheme, namely LaneCMKT, to address this issue by transferring 3D geometric cues learned from a pre-trained LiDAR model to the image model. Performing on the unified Bird's-Eye-View (BEV) grid, our monocular image model acts as a student network and benefits from the spatial guidance of the 3D LiDAR teacher model over the intermediate feature space. Since LiDAR points and image pixels are intrinsically two different modalities, to facilitate such heterogeneous feature transfer learning at matching levels, we propose a dual-path knowledge transfer mechanism. We divide the feature space into shallow and deep paths where the image student model is prompted to focus on lane-favored geometric cues from the LiDAR teacher model. We conduct extensive experiments and thorough analysis on the large-scale public benchmark OpenLane. Our model achieves notable improvements over the image baseline by 5.3% and the current BEV-driven SoTA method by 2.7% in the F1 score, without introducing any extra computational overhead. We also observe that the 3D abilities grabbed from the teacher model are critical for dealing with complex spatial lane properties from a 2D perspective.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: We innovatively propose a Cross-Modal Knowledge Transfer (LaneCMKT) framework for monocular 3D lane detection in the autonomous driving domain. This research pioneers in developing the knowledge transfer scheme to improve image lane feature learning with spatial awareness derived from LiDAR point data. We address the challenges of bridging heterogeneous data sources of a monocular image and LiDAR point cloud by a dual-path knowledge transfer mechanism in multi-layer transfer learning. Additionally, we propose an adaptive scaling strategy to enable the image model learning to selectively learn shallow LiDAR geometric features for lane instances. Demonstrating an improvement in F1 score and distance errors over existing state-of-the-art methods, our work improves the accuracy of monocular 3D lane detection without introducing additional computational overhead. The knowledge transfer technique is important in the multimedia processing area to complement the advantages across different modality data sources.
Submission Number: 2295
Loading