AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Panoramic Activity Recognition (PAR) aims to identify multi-granul-arity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multiple persons with varying size and spatial occlusion in panoramic scenes, blocking the performance gain of PAR. To this end, we consider learning a detector adapting varying-size occluded persons, which is optimized along with the recognition module in the all-in-one framework. Therefore, we propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes by learning an adapt-focused detector and multi-granularity prototypes as the pretext tasks in an end-to-end way. Specifically, to accommodate the varying sizes and spatial occlusion of multiple persons in crowed panoramic scenes, we introduce a panoramic adapt-focuser, achieving the size-adapting detection of individuals by comprehensively selecting and performing fine-grained detections on object-dense sub-regions identified through original detections. In addition, to mitigate information loss due to inaccurate individual localizations, we introduce a bi-propagation prototyper that promotes closed-loop interaction and informative consistency across different granularities by facilitating bidirectional information propagation among the individual, group, and global levels. Extensive experiments demonstrate the significant performance of AdaFPP and emphasize its powerful applicability for PAR.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: Crowded panoramic scenes are becoming common in the smart robot area within human-machine interaction. Generally, panoramic scenes encompass a spectrum of human behaviors ranging from individual actions to interactive group activities and global activities. AdaFPP is dedicated to achieving a comprehensive understanding of multi-person scenes by jointly identifying multi-granularity behaviors, including individual actions, group activities, and global activities. This comprehensive understanding of panoramic activity not only advances the development of multimedia interpretation and analysis techniques, but also facilitates more meaningful applications of human-machine interaction.
Supplementary Material: zip
Submission Number: 1785
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview