Temporal aggregation for first-person action recognition using Hilbert-Huang transform

Didik Purwanto, Yie-Tarng Chen, Wen-Hsien Fang

Published: 2017, Last Modified: 27 Feb 2026ICME 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents a new approach for action recognition in the first-person videos which aggregates both of the short- and long-term trends based on the coefficients of the Hilbert-Huang transform (HHT), a renowned time-frequency analysis tool. In contrast to previous works like Pooled Time Series (PoT), the new scheme can extract the salient features of activities based on the non-stationary HHT analysis, which consists of empirical mode decomposition and Hilbert spectral analysis, and can be incorporated with the convolutional neural network (CNN) features such as trajectory pooled CNN features to achieve superior detection accuracy. Conducted simulations show that the proposed method outperforms the main state-of-the-art works on two widespread public first-person datasets.

External IDs:dblp:conf/icmcs/PurwantoCF17