Abstract: This paper focuses on the accurate prediction of pedestrian trajectories in scenarios where individuals walk alone or in social groups, and sometimes alter their paths to avoid collisions. While previous work has improved backbone neural networks to model individual motion patterns, few studies have explicitly addressed the consistency of internal motion patterns or properness of external interactions. To address this, we propose a unified framework consisting of a Contrastive History-Prediction (CHIP) module and a Differentiable Social Interaction Ranking (DSIR) module. The CHIP module utilizes unsupervised contrastive loss to optimize predicted motion patterns consistent with observations, while the supervised DSIR module ensures predicted interactions are compatible with realistic positions. Our analysis and numerical studies demonstrate the effectiveness of our approach, which achieves a 5–10% improvement in positional accuracy and a 3–7% boost in interactive properness. We provide comprehensive visualizations of anticipated trajectories with temporal interactive scores across various scenarios.
Loading