Abstract: Pedestrian trajectory prediction plays a critical role in ensuring the safe operation of autonomous vehicles. Predicting from an egocentric view can eliminate the cumulative computational errors associated with scene perspective transformations. However, compared to predictions from a bird’s-eye view, a key challenge in the egocentric setting is that both the ego vehicle’s motion and the pedestrian’s motion simultaneously influence the target’s movement. To address this, we propose an agent-wise motion fusion network (AANet), which efficiently predicts the multimodal trajectories by learning the agent-wise motion step by step, and history trajectories feature in a two-stream structure. Specifically, we utilize the trajectory of the pedestrian, the ego vehicle and pedestrian motion to predict the multimodal trajectory of the pedestrian. One stream of the AANet studies the contextual information by the step-wise attention of the agent-wise motion to enhance the scenario understanding, while the other stream studies the temporal relationship of the trajectory. In addition, a query-based multistage decoder is designed and the prediction of the crossing intention of the pedestrian serves as an auxiliary task, which helps to understand the high-level motivation of the future motion of the pedestrian. Finally, the prediction results on the joint attention for autonomous driving (JAAD) and pedestrian intention estimation (PIE) datasets improve approximately 13% and 12%, respectively, demonstrating the effectiveness and our model achieves state-of-the-art performance.
External IDs:dblp:journals/iotj/NiuHYCD25
Loading