Hierarchical Motion-Enhanced Matching Framework for Few-Shot Action Recognition

Hailiang Gao, Guo-Sen Xie, Rui Yan, Qiongjie Cui, Hongyu Qu, Xiangbo Shu

Published: 01 Jan 2025, Last Modified: 06 Nov 2025IEEE Transactions on MultimediaEveryoneRevisionsCC BY-SA 4.0
Abstract: Few-Shot Action Recognition (FSAR) aims to recognize novel class action with limited annotated training data from the same class. Most FSAR methods subconsciously follow the few-shot image classification solutions by solely focusing on appearance-level matching between support and query videos, such as part-level matching, frame-level matching, and segment-level matching. However, these methods, almost always, have two main limitations: 1) generally ignore the relationship among these part-, frame- and segment-level features and 2) may mismatch the same class actions under fast-term and slow-term dynamics. To this end, we present a novel Hierarchical Motion-enhanced Matching (HM${^{2}}$) framework to hierarchically learn the relation-aware multi-modal features, and jointly promote the multi-modal matching, including appearance-level matching on segments, frames, and parts, as well as the motion-level matching on dynamics. Specifically, we first propose a new Hierarchical Tokenizer (HT) to learn multi-modal features, namely utilizing a hierarchical Transformer to learn appearance-level features, along with a Slow-Fast Aware Motion (SFAM) strategy to learn motion-level features covering fast- and slow-term dynamics. Next, we propose a new Relation-aware Matcher (RM) to match the multi-modal features, by leveraging a Hierarchical Relational Graph Convolutional Network (H-RGCN) to capture the relationship among these appearance-level features. Further, a Dual Sample-to-Class Matching (DSCM) strategy is proposed to measure the bidirectional similarities among appearance- and motion-modal features by sample-to-class matching and class-to-sample matching. Extensive experiments on four golden FSAR datasets demonstrate significant performance improvements of HM${^{2}}$ compared with the state-of-the-art methods.
Loading