Driving Scene Understanding with Traffic Scene-Assisted Topology Graph Transformer

Fu Rong, Wenjin Peng, Meng Lan, Qian Zhang, Lefei Zhang

Published: 28 Oct 2024, Last Modified: 06 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Driving scene topology reasoning aims to understand the objects present in the current road scene and model their topology relationships to provide guidance information for downstream tasks. Previous approaches fail to adequately facilitate interactions among traffic objects and neglect to incorporate scene information into topology reasoning, thus limiting the comprehensive exploration of potential correlations among objects and diminishing the practical significance of the reasoning results. Besides, the lack of constraints on lane direction may introduce erroneous guidance information and lead to a decrease in topology prediction accuracy. In this paper, we propose a novel topology reasoning framework, dubbed TSTGT, to address these issues. Specifically, we design a divide-and-conquer topology graph Transformer to respectively infer the lane-lane and lane-traffic topology relationships, which can effectively aggregate the local and global object information in the driving scene and facilitate the topology relationship learning. Additionally, a traffic scene-assisted reasoning module is devised and combined with the topology graph Transformer to enhance the practical significance of lane-traffic topology. In terms of lane detection, we develop a point-wise matching strategy to infer lane centerlines with correct directions, thereby improving the topology reasoning accuracy. Extensive experimental results on Openlane-V2 benchmark validate the superiority of our TSTGT over state-of-the-art methods and the effectiveness of our proposed modules. The code is available at https://github.com/rongfu-dsb/TSTGT.

External IDs:doi:10.1145/3664647.3681483