Prompt-Guided Prototype-Aware Commonality and Discrimination Learning for Zero-Shot Skeleton-Based Action Recognition
Abstract: Zero-Shot Skeleton-Based Action Recognition (ZSSAR) is an emerging research field focused on developing alignment models that connect skeleton movements with action definitions, thus enabling generalization to unobserved actions. Current methods often employ generative models to reconstruct cross-modal features or enhance mutual information across modalities for alignment. However, when applied to unseen action categories, these models often neglect the inherent consistency among basic actions, thereby diminishing their generalization capabilities. Furthermore, imprecise annotations fail to capture the rich semantic details of actions, resulting in misalignment. Inspired by human cognitive processes and chain of thought, we argue that integrating prior information about human actions with intrinsic commonality knowledge of basic actions is essential for ZSSAR. To actualize this, we propose a novel method termed Prompt-guided Prototype-aware Commonality and Discrimination Learning (PP-CDL). This method utilize the comprehensive world knowledge contained in LLMs, employing tailored prompts to partition seen action categories into distinct, non-overlapping prototype spaces that embody the commonality knowledge of basic actions. Subsequently, we introduce the Inter- and Intra-Prototype Discriminating (I2PD) module and the Intra-Prototype Commonality Mining (IPCM) module. The I2PD amplifies the distinctiveness of knowledge within prototypes, furnishing a personalized search space for the recognition of unseen actions. In contrast, the IPCM models the shared commonality concept within prototypes, bolstering the consistency between skeleton action representations and corresponding text knowledge representations. Experiments on different skeleton action benchmarks demonstrate the significant improvement of our method over existing alternatives.
External IDs:doi:10.1109/tmm.2025.3590904
Loading