FRUIT: Fast Road User Interactions Modeling With Latent Intention and Multimodal Trajectory for Autonomous Driving

Haoran Wu, Hao Cheng, Sifa Zheng, Chuang Zhang

Published: 01 Jan 2025, Last Modified: 14 May 2025IEEE Intell. Transp. Syst. Mag. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In mixed and complex traffic environments, intelligent vehicles must interact with densely distributed road users, resulting in a heavy computational workload for the autonomous driving system. To address this issue, current research primarily focuses on the crossing/not crossing problem, regarding crossing intentions as indicators for transitioning among predefined motion modes of road users. However, this methodology can capture only the short-term behavior of road users, offering limited assistance in the long-term decision-making process of intelligent vehicles. In this article, we propose a fast road user interaction (FRUIT) modeling method, capable of integrating various latent intention estimation (LIE) and multimodal trajectory prediction (MTP) modules and improving the computational efficiency of autonomous driving in complex traffic environments by excluding irrelevant agents with latent intentions. Then, we present a competitive implementation of the LIE and MTP modules. The LIE module is built upon the theory of planned behavior (TPB). Unlike previous research that relies on surveys and questionnaires for qualitative analysis, we quantitatively model the components of the TPB and employ a mixed classification strategy to simulate the interactions among these components. This enables the autonomous driving system to prioritize critical risk sources for downstream tasks. Next, the MTP module predicts the future trajectories of road users with latent intentions. A residual long short-term memory network is proposed to extract the motion features, while graph convolution is applied to extract the road topology. To avoid mode collapse, we improve the loss function and postprocessing procedure to enhance the multimodality. The effectiveness of our LIE module is demonstrated on the Pedestrian Intention Estimation dataset, displaying an accuracy improvement of 3% to 19% compared to related methods. Additionally, the MTP module demonstrates competitive performance compared to related methods on the Argoverse dataset. Finally, the FRUIT method enhances computational efficiency by 16.7%, thereby contributing to the real-time performance of autonomous driving.