Abstract: Effectively modeling the interactions among actors is critical and challenging for Group Activity Recognition (GAR). Previous methods usually divide actors into subgroups based on the similarity of appearance features for modeling multilevel interactions among actors. However, the appearance feature-based grouping scheme does not fully consider the spatial relations of actors, which can provide a discriminative clue for GAR. In this paper, we propose a Spatial Formation-Guided Network (SFGN) to capture effective interactions under the guidance of spatial formations. We first design a spatial formation extractor to excavate latent spatial relations among actors for extracting spatial formation features. Then, a formation-guided interaction module is built to utilize the spatial formation features to guide the interactions among actors. Finally, a cross-formation interaction module is further designed to explore the complementarity among diverse spatial formations. Extensive experiments on the volleyball dataset and the collective activity dataset demonstrate that SFGN outperforms the state-of-the-art methods.
Loading