Towards Robust and Lightweight Vision-based Pedestrian Trajectory Prediction

Renhao Huang

Published: 2025, Last Modified: 08 Jan 2026undefined 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Vision-based pedestrian trajectory prediction aims to predict future trajectories for pedestrians based on their historical trajectories and surrounding environment using vision information. In recent years, great progress has been made in this topic, and excellent performance has been achieved on public benchmarks. However, research has been focused on the robustness issues when applied in unseen environments. Concurrently, heatmap-oriented trajectory prediction has become popular due to its effective scene interaction and intuitive uncertainty modelling. However, its consumption significantly increases when multi-pedestrian and multi-future predictions are required. The first topic addresses the robustness issues with two proposed methods: motion-prior based trajectory prediction (Traj-MP) and region-aware prediction clustering (RPC). The Traj-MP combines deep learning models with motion priors estimated using physic rules, enhancing the generalisation in unseen domains and simultaneously simplifying model training. The RPC is a learning-free postprocessing method that integrates social interaction in pretrained trajectory prediction models through oversampling, filtering, clustering and waypoint refinement based on the occupancy of unwalkable regions. These two methods provide a huge performance boost in cross-dataset evaluations, illustrating their improved robustness in unseen environments. The second topic addresses the consumption issues in heatmap-oriented trajectory prediction, with two studies: HyperTraj and DecoupleTraj. The HyperTraj tackles the huge consumption coming from repetitive execution of waypoint heatmap regression when conditioning on multiple sampled endpoints. We propose a lightweight and fully convolutional network with three heads to predict the endpoint heatmap, convolutional kernels and waypoint heatmaps. By performing dynamic convolution using the predicted convolutional kernels during the waypoint heatmap decoding, HyperTraj predicts multiple future predictions with a constant latency. DecoupleTraj tackles the memory consumption issue due to the repetitive scene encoding when predicting for multiple pedestrians in parallel. We fully decouple the encoding on the scene image and trajectory coordinates and fuse them using Transformer-based decoder to regress endpoint heatmaps. These two methods successfully predict multiple future trajectories for various numbers of pedestrians with low latency and memory consumption.

External IDs:dblp:phd/basesearch/Huang25