Spatial-Temporal Transformer for Crime Recognition in Surveillance Videos

Kayleigh Boekhoudt, Estefanía Talavera

Published: 01 Jan 2022, Last Modified: 12 May 2023AVSS 2022Readers: Everyone

Abstract: Human-related crime recognition from surveillance videos becomes an even more challenging task when dealing with relatively similar human actions. We propose a transformer-based model that relies on the spatial-temporal representation of extracted skeletal trajectories for fine-grained classification. We validate the effectiveness of our model on the complex HR-Crime dataset consisting of videos representing 13 categories of human-related crimes. Quantitative and qualitative results suggest that building a transformer architecture with coupled spatial and temporal modules enables the model to compete in performance while improving intrinsic interpretability.

0 Replies