Orientation-Aware Multi-Modal Learning for Road Intersection Identification and Mapping

Published: 01 Jan 2024, Last Modified: 15 May 2025ICRA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Accurate identification of road intersections is the pivotal task for automatic construction of high-definition maps, particularly in unstructured scenes. Existing methods predominantly rely on single-modal data and thus show an obvious unimodal limitation, i.e., lack of contextual information. Moreover, these approaches overlook the benefits of leveraging multi-modal data fusion and representation learning that is crucial for generalizability. To this end, we propose a novel orientation-aware multi-modal learning paradigm, which formulates intersection identification as an oriented object detection task. Specifically, heterogeneous fusion is introduced to harmonize disparate data modalities, i.e., vector maps, point clouds, and vehicle trajectories, into a unified feature space. Concurrently, we present trigonometry-induced adaptive regression to elevate orientation estimation, while mitigating issues related to scale imbalance and boundary confusion through dual-objective matching with spatial adaptation. To evaluate our methodology, we assemble the first-of-its-kind multi-modal benchmark tailored for complex low-speed environments, complete with fine-grained semantic annotations for intersections. Comprehensive empirical analyses, including ablation studies, affirm both the superior performance of our proposed framework and the efficacy of its constituent modules.
Loading