An Empirical Study of Weights in Deep Convolutional Neural Networks and Its Application to Training Convergence
Abstract: This paper presents an empirical study of weights in deep neural networks and propose a quantitative metric, Logarithmical Geometric Mean of absolute weight parameter (LoGM), to evaluate the impact of weight on training convergence. We develop an automatic tool to measure LoGM and conduct extensive experiments on ImageNet with three well-known deep convolutional neural networks (CNNs). We discover two empirical observations from the experiments on same model: 1) LoGM variance is small between weight snapshots per iteration; and 2) each CNN model has a reasonable divergence region. Preliminary results show our methodology is effective with convergence problem exposure time reduction from weeks to minutes. Three known convergence issues are confirmed and one new problem is detected at early stage of feature development. To the best of our knowledge, our work is first attempt to understand the impact of weight on convergence. We believe that our methodology is general and applicable on all deep learning frameworks. The code and training snapshots will be made publicly available.
TL;DR: A quantitative approach to detecting convergence problem within minimal iterations for CNN training
Keywords: Deep Neural Networks, Quantitative Metric, Convergence, Divergence Region
4 Replies
Loading