Abstract: Forecasting the future trajectory of pedestrians is an important
task in computer vision with a range of applications, from
security cameras to autonomous driving. It is very challenging
because pedestrians not only move individually across
time but also interact spatially, and the spatial and temporal
information is deeply coupled with one another in a multiagent
scenario. Learning such complex spatio-temporal correlation
is a fundamental issue in pedestrian trajectory prediction.
Inspired by the procedure that the hippocampus processes
and integrates spatio-temporal information to form
memories, we propose a novel multi-stream representation
learning module to learn complex spatio-temporal features
of pedestrian trajectory. Specifically, we learn temporal, spatial
and cross spatio-temporal correlation features in three
respective pathways and then adaptively integrate these features
with learnable weights by a gated network. Besides, we
leverage the sparse attention gate to select informative interactions
and correlations brought by complex spatio-temporal
modeling and reduce complexity of our model. We evaluate
our proposed method on two commonly used datasets, i.e.,
ETH-UCY and SDD, and the experimental results demonstrate
that our method achieves state-of-the-art performance.
Code: https://github.com/YuxuanIAIR/MSRL-master
Loading