Empirical Investigation on Model Capacity and Generalization of Neural Networks for Text


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Recently, deep neural network models have shown promising opportunities for many natural language processing (NLP) tasks. In practice, the number of parameters of deep neural models is often significantly larger than the size of the training set, and its generalization behavior cannot be explained by the classic generalization theory. In this paper, with extensive experiments, we empirically investigate the model capacity and generalization of neural models for text. The experiments show that deep neural models can find patterns better than brute-force memorization. Therefore, a large-capacity model with early-stopping stochastic gradient descent (SGD) as implicit regularizer seems to be the best choice, as it has better generalization ability and higher convergence speed.
  • Keywords: Text, Empirical Investigation, Model Capacity, Generalization Ability, Neural Networks, Deep Learning