KS-FuseNet: An Efficient Action Recognition Method Based on Keyframe Selection and Feature Fusion

Published: 01 Jan 2024, Last Modified: 17 Apr 2025PRCV (7) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Addressing the challenge of effectively capturing features in contemporary video tasks, we propose an action recognition approach grounded in keyframe filtering and feature fusion. Our method comprises two core modules. The keyframe screening module employs an attention mechanism to segregate the input depth feature map sequence into two distinct tensors, effectively reducing spatial redundancy computation and enhancing key feature capture. The other spatio-temporal and action feature module features two branches with divergent structures, performing spatio-temporal and action feature extraction on the differentiated features from the previous module. Through these closely linked modules, our approach effectively discerns and extracts meaningful video features for subsequent classification tasks. We construct an end-to-end deep learning model using established frameworks, training and validating it on a generic video dataset, and confirm its efficacy through comparison and ablation experiments. Experiments conducted on this dataset demonstrate that our model surpasses the majority of prior works.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview