- Abstract: We introduce the concept of channel aggregation in ConvNet architecture, a novel compact representation of CNN features useful for explicitly modeling the nonlinear channels encoding especially when the new unit is embedded inside of deep architectures for action recognition. The channel aggregation is based on multiple-channels features of ConvNet and aims to be at the spot finding the optical convergence path at fast speed. We name our proposed convolutional architecture “nonlinear channels aggregation networks (NCAN)” and its new layer “nonlinear channels aggregation layer (NCAL)”. We theoretically motivate channels aggregation functions and empirically study their effect on convergence speed and classification accuracy. Another contribution in this work is an efficient and effective implementation of the NCAL, speeding it up orders of magnitude. We evaluate its performance on standard benchmarks UCF101 and HMDB51, and experimental results demonstrate that this formulation not only obtains a fast convergence but stronger generalization capability without sacrificing performance.
- Keywords: action recognition, convolutional neural network, network training
- TL;DR: An architecture enables CNN trained on the video sequences converging rapidly