Semantic Map Guided Bird’s-Eye View Learning for Online HD Map Construction

Huantao Ren, Hesham Mohamed Eraqi, ABM Musa, Mohamed N. Moustafa

Published: 23 Mar 2026, Last Modified: 08 Feb 2026WACV 2026EveryonearXiv.org perpetual, non-exclusive license

Abstract: Vectorized High-Definition (HD) maps offer rich and precise environmental information about driving scenes, playing a crucial role in improving driver safety by supporting autonomous driving and advanced driver-assistance systems (ADAS). Processing individual camera images creates fragmented view of the world requiring complex and errorprone merging. Existing multi-view camera methods train deep neural networks to directly generate a unified bird’seye view (BEV) features used to learn HD map construction. Nevertheless, a significant limitation is the lack of direct supervision of the learned BEV features based on the ground-truth map elements. To overcome this limitation, we propose a novel method, referred to as Semantic Map Guidance (SMG), for explicit alignment of the learned BEV features and the corresponding semantic representations by utilizing ground-truth label during training. We demonstrate the effectiveness of the proposed SMG method by incorporating it into multiple state-of-the-art BEV-based methods for online HD map construction task. We perform extensive experiments on two widely used HD map datasets, nuScenes and Argoverse 2, demonstrating that SMG, without any bells and whistles, consistently improves the accuracy of all the tested networks by using the same base network implementation and hyperparameters without any additional inference time.