Keywords: Pedestrian Behavior Prediction; Computer Vision; Deep Learning; Autonomous Driving; Benchmark Dataset
Abstract: Predicting pedestrian behavior is a crucial component in autonomous driving technology, fostering safer navigation and accident prevention for autonomous vehicles. Presently, research in pedestrian behavior modeling bifurcates into two distinct approaches: the egocentric view and the bird-eye view. Both perspectives offer unique advantages and drawbacks, yet there's a discernible absence of work integrating these two views. In this paper, we introduce a novel Multi-modal Cross-Attentive Fusion algorithm (MCAF) that concurrently models trajectories from both perspectives, utilizing visual and spatial modalities in conjunction with interaction data and maps. We incorporate six different modalities from the two views (egocentric and bird-eye view), which include high-definition map (HD map), target and surrounding trajectories, egocentric image, egocentric trajectory, and ego-vehicle actions. Based on the nuScenes dataset, we construct a pedestrian trajectory dataset (nuScenes-DuoView) that encapsulates both views. Our findings indicate that this approach achieves superior performance to current methods, demonstrating an 8% and 12% improvement in Final Displacement Error (FDE) in the egocentric and bird-eye views, respectively. Additionally, the ablation study substantiates the benefits of fusing these two views.
Submission Number: 2
Loading