Optimization Planning for 3D ConvNetsDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: 3D ConvNets, Network Training, Video Recognition
Abstract: 3D Convolutional Neural Networks (3D ConvNets) have been regarded as a powerful class of models for video recognition. Nevertheless, it is not trivial to optimally learn a 3D ConvNets due to high complexity and various options of the training scheme. The most common hand-tuning process starts from learning 3D ConvNets using short video clips and then is followed by learning long-term temporal dependency using lengthy clips, while gradually decaying the learning rate from high to low as training progresses. The fact that such process comes along with several heuristic settings motivates the study to seek an optimal ``path'' to automate the entire training. In this paper, we decompose the path into a series of training ``states'' and specify the hyper-parameters, e.g., learning rate and the length of input clips, in each state. The estimation of the knee point on the performance-epoch curve triggers the transition from one state to another. We perform dynamic programming over all the candidate states to plan the optimal permutation of states, i.e., optimization path. Furthermore, we devise a new 3D ConvNets with a unique design of dual-head classifier to improve the spatial and temporal discrimination. Extensive experiments conducted on seven public video recognition benchmarks demonstrate the advantages of our proposal. With the optimization planning, our 3D ConvNets achieves superior results when comparing to the state-of-the-art video recognition approaches. More remarkably, we obtain the top-1 accuracy of 82.5% and 84.3% on the large-scale Kinetics-400 and Kinetics-600 datasets, respectively.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We propose optimization planning mechanism to automate the design of training strategy for 3D ConvNets.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2201.04021/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=hJRkKyYiHH
10 Replies

Loading