Abstract: Accurate 3D lane detection from monocular images is crucial for autonomous driving. Recent advances leverage either front-view (FV) or bird’s-eye-view (BEV) features for prediction, inevitably limiting their ability to perceive driving environments precisely and resulting in suboptimal performance. To overcome the limitations of using features from a single view, we design a novel dual-view cross-attention mechanism, which leverages features from FV and BEV simultaneously. Based on this mechanism, we propose 3DLaneFormer, a powerful framework for 3D lane detection. It outperforms the latest BEV-based or FV-based approaches through extensive experiments on challenging benchmarks and thus verifies the necessity and benefits of utilizing features in both views.
Loading