Abstract: This paper presents a new approach for action recognition in the first-person videos which aggregates both of the short- and long-term trends based on the coefficients of the Hilbert-Huang transform (HHT), a renowned time-frequency analysis tool. In contrast to previous works like Pooled Time Series (PoT), the new scheme can extract the salient features of activities based on the non-stationary HHT analysis, which consists of empirical mode decomposition and Hilbert spectral analysis, and can be incorporated with the convolutional neural network (CNN) features such as trajectory pooled CNN features to achieve superior detection accuracy. Conducted simulations show that the proposed method outperforms the main state-of-the-art works on two widespread public first-person datasets.
External IDs:dblp:conf/icmcs/PurwantoCF17
Loading