EPAM-Net: An efficient pose-driven attention-guided multimodal network for video action recognition

Ahmed Abdelkawy, Asem M. Ali, Aly A. Farag

Published: 2025, Last Modified: 07 Nov 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A novel, efficient pose-guided multimodal network is proposed for action recognition.•The eXpand temporal Shift model is introduced to rival 3D CNNs (X3D) with fewer GFLOPs.•A pose attention block is proposed to guide RGB stream to keyframes and key body regions.•Our multimodal net rivals SoTA on 4 datasets, reducing FLOPs/parameters by 72.8x/48.6x.

External IDs:dblp:journals/ijon/AbdelkawyAF25