Driving Scene Understanding with Traffic Scene-Assisted Topology Graph Transformer

Published: 20 Jul 2024, Last Modified: 05 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Driving scene topology reasoning aims to understand the objects present in the current road scene and model their topology relationships to provide guidance information for downstream tasks. Previous approaches fail to adequately facilitate interactions among traffic objects and neglect to incorporate scene information into topology reasoning, thus limiting the comprehensive exploration of potential correlations among objects and diminishing the practical significance of the reasoning results. Besides, the lack of constraints on lane direction may introduce erroneous guidance information and lead to a decrease in topology prediction accuracy. In this paper, we propose a novel topology reasoning framework, dubbed TSTGT, to address these issues. Specifically, we design a divide-and-conquer topology graph Transformer to respectively infer the lane-lane and lane-traffic topology relationships, which can effectively aggregate the local and global object information in the driving scene and facilitate the topology relationship learning. Additionally, a traffic scene-assisted reasoning module is devised and combined with the topology graph Transformer to enhance the practical significance of lane-traffic topology. In terms of lane detection, we develop a point-wise matching strategy to infer lane centerlines with correct directions, thereby improving the topology reasoning accuracy. Extensive experimental results on Openlane-V2 benchmark validate the superiority of our TSTGT over state-of-the-art methods and the effectiveness of our proposed modules.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This study delves into the topology relationships among objects within driving scenes, a common context in multimedia applications. By scrutinizing and modeling object relationships in these scenes, our approach leverages multi-view image data to extract valuable insights, presenting novel perspectives and solutions for multimedia applications. This holistic utilization of image data not only enriches the practicality of multimedia data processing but also provides essential support for deeper comprehension and analysis of intricate scenes. Consequently, this research stands as a pivotal exemplar in the realm of multimedia applications, illustrating how the integration of image data with topology modeling in driving scenes can amplify the understanding and processing capabilities of multimedia data, thereby catalyzing advancements and innovation within multimedia applications.
Supplementary Material: zip
Submission Number: 4230
Loading