Spatial Dual Context Learning for Weakly-supervised Group Activity Recognition in Still-images

Zhao Wu, Dunbo Ning, Wenjing Chen, Hao Sun, Wei Xie, Ming Dong

Published: 01 Jan 2024, Last Modified: 15 May 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper investigates a new task, Weakly- supervised Group Activity Recognition in Still-images (WGARS), which aims to extend the applicability of Group Activity Recognition (GAR) to broader scenarios, such as low-latency domains. To tackle this challenge, we propose a Spatial Dual Context Transformer (SDCT), comprising a Dual Context Encoder (DCE) and a Dual Context Decoder (DCD). The DCE module individually encodes holistic context with integral relations of overall actors, and encodes partial context with individual features in still images. Subsequently, the DCD module explores the complementarity between holistic and partial contexts, and alternatively updates these encoded contexts to enhance the interaction of actors. Additionally, auxiliary supervised contrastive learning is incorporated to mitigate activity confusion. The proposed SDCT attains state-of-the-art performance on Volleyball and NBA datasets in WGARS. Notably, SDCT even outperforms recent methods when extended to the weakly-supervised GAR in videos task on Volleyball dataset.