Sparse Prototype Network for Explainable Pedestrian Behavior Prediction

Yan Feng, Alexander Carballo, Kazuya Takeda

Published: 01 Jan 2025, Last Modified: 12 Nov 2025IEEE Robotics Autom. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Predicting pedestrian behavior is challenging yet crucial for applications such as autonomous driving and smart cities. Recent deep learning models have achieved remarkable performance in making accurate predictions, but they fail to provide explanations of their inner workings. One reason for this problem is the multi-modal inputs. To bridge this gap, we present Sparse Prototype Network (SPN), an explainable method designed to simultaneously predict a pedestrian's future action, trajectory, and pose. SPN leverages an intermediate prototype bottleneck layer to provide sample-based explanations for its predictions. The prototypes are modality-independent, meaning that they can correspond to any modality from the input. Therefore, SPN can extend to arbitrary combinations of modalities. Regularized by mono-semanticity and clustering constraints, the prototypes learn consistent and human-understandable features and achieve state-of-the-art performance on action, trajectory and pose prediction on TITAN and PIE. Finally, we propose a metric named Top-K Mono-semanticity Scale to quantitatively evaluate the explainability. Qualitative results show a positive correlation between sparsity and explainability.

External IDs:dblp:journals/ral/FengCT25