Abstract: As a widely-accepted evaluation criterion, complexity has attracted more and more attention in the design of language models. The parameter count is a proxy for complexity, which is often reported and compared in research papers. In general, more parameters means better model performance, but higher complexity. Therefore, reconciling the contradiction between the complexity and the model performance is necessary. In this paper, we propose a simple method to make use of model parameters more effectively, so that the LSTM-based language models can reach better results without the cost of increasing parameters. The method constructs another small-scale LSTM with a part of parameters originally belonging to the vanilla LSTM in each layer, whose output can assist the next layer in processing the output of the vanilla LSTM. We name these two LSTMs Major Minor LSTMs. In experiments, we demonstrate the language model with Major Minor LSTMs surpasses the existing state-of-the-art model on Penn Treebank and WikiText-2 with fewer parameters.
Keywords: Language model, LSTM, Deep Learning, NLP
10 Replies
Loading