IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction

Jiangtong Zhu; YangZhao; Yinan Shi; Jianwu Fang; Jianru Xue

IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction

Jiangtong Zhu, YangZhao, Yinan Shi, Jianwu Fang, Jianru Xue

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Online vector map construction based on visual data can bypass the processes of data collection, post-processing, and manual annotation required by traditional map construction, which significantly enhances map-building efficiency. However, existing work treats the online mapping task as a local range perception task, overlooking the spatial scalability required for map construction. We propose \emph{IC-Mapper}, an instance-centric online mapping framework, which comprises two primary components: 1) \textbf{Instance-centric temporal association module:} For the detection queries of adjacent frames, we measure them in both feature and geometric dimensions to obtain the matching correspondence between instances across frames. 2) \textbf{Instance-centric spatial fusion module:} We perform point sampling on the historical global map from a spatial dimension and integrate it with the detection results of instances corresponding to the current frame to achieve real-time expansion and update of the map. Based on the nuScenes dataset, we evaluate our approach on detection, tracking, and global mapping metrics. Experimental results demonstrate the superiority of IC-Mapper against other state-of-the-art methods.

Primary Subject Area: [Experience] Multimedia Applications

Secondary Subject Area: [Content] Multimodal Fusion

Relevance To Conference: This work, focusing on the expansion of local map detection to spatially continuous multi-frame mapping, plays a significant role in the multimedia field by advancing the capabilities of temporal and spatial visual data processing and representation. It not only enhances the efficiency and accuracy of map construction for applications like autonomous driving but also pushes the boundaries of how dynamic environments are captured and interpreted in real-time. By leveraging visual data for map building, it reduces the reliance on traditional, labor-intensive mapping processes, such as manual data annotation and extensive post-processing. This approach introduces a more scalable, flexible, and efficient methodology for generating vectorized maps, facilitating a deeper understanding and interaction with complex spatial environments. Overall, it contributes to the development of more sophisticated, real-time multimedia applications that require precise environmental awareness and interaction.

Supplementary Material: zip

Submission Number: 3235

Loading