Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Big Neural Networks Waste Capacity
Yann Dauphin, Yoshua Bengio
Jan 17, 2013 (modified: Jan 17, 2013)ICLR 2013 conference submissionreaders: everyone
Abstract:This article exposes the failure of some big neural networks to leverage added capacity to reduce underfitting. Past research suggest diminishing returns when increasing the size of neural networks. Our experiments on ImageNet LSVRC-2010 show that this may be due to the fact that bigger networks underfit the training objective, sometimes performing worse on the training set than smaller networks. This suggests that the optimization method - first order gradient descent - fails at this regime. Directly attacking this problem, either through the optimization method or the choices of parametrization, may allow to improve the generalization error on large datasets, for which a large capacity is required.
Enter your feedback below and we'll get back to you as soon as possible.