Fragrant: frequency-auxiliary guided relational attention network for low-light action recognition

Wenxuan Liu, Xuemei Jia, Yihao Ju, Yakun Ju, Kui Jiang, Shifeng Wu, Luo Zhong, Xian Zhong

Published: 2025, Last Modified: 15 May 2025Vis. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Video action recognition aims to classify actions within sequences of video frames, which has important applications in computer vision fields. Existing methods have shown proficiency in well-lit environments but experience a drop in efficiency under low-light conditions. This decline is due to the challenge of extracting relevant information from dark, noisy images. Furthermore, simply introducing enhancement networks as preprocessing will lead to an increase in both parameters and computational burden for the video. To address this dilemma, this paper presents a novel frequency-based method, FRequency-Auxiliary Guided Relational Attention NeTwork (FRAGRANT), designed specifically for low-light action recognition. Its distinctive features can be summarized as: (1) a novel Frequency-Auxiliary Module that focuses on informative object regions, characterizing action and motion while effectively suppressing noise; (2) a sophisticated Relational Attention Module that enhances motion representation by modeling the local s between position neighbors, thereby more efficiently resolving issues, such as fuzzy boundaries. Comprehensive testing demonstrates that FRAGRANT outperforms existing methods, achieving state-of-the-art results on various standard low-light action recognition benchmarks.