Abstract: Group Activity Recognition (GAR) is a challenging problem in computer vision due to the intricate dynamics and interactions among individuals. The existing methods utilize RGB videos face challenges in panoramic environments with numerous individuals and social groups. In this paper, we propose Multimodal Group Activity Recognition network (MGAR-net), that leverages the combined power of RGB and LiDAR modalities. Our approach effectively utilizes information from both modalities thus robustly and accurately captures individual relationships and detects social groups in face of optical challenges. By harnessing the capability of LiDAR with our new fusion module, called Distance Aware Fusion Module (DAFM), MGAR-net acquires valuable 3D structure information. We conduct experiments on the JRDB-Act dataset, which contains challenging scenarios with numerous people. The results demonstrate that LiDAR data provide valuable information for social grouping and recognizing individual action and group activities, particularly in crowded group settings. For social grouping, our MGAR-net improve performance by about 12% compared to the existing state-of-the-art models in terms of the AP metric.
Loading