Abstract: Neural language models have achieved great success in many NLP tasks, to a large extent, due to the ability to capture contextual dependencies among terms in a text. While many efforts have been devoted to empirically explain the connection between the network hyperparameters and the ability to represent the contextual dependency, the theoretical analysis is relatively insufficient. Inspired by the recent research on the use of tensor space to explain the neural network architecture, we explore the interpretable mechanism for neural language models. Specifically, we define the concept of separation rank in the language modeling process, in order to theoretically measure the degree of contextual dependencies in a sentence. Then, we show that the lower bound of such a separation rank can reveal the quantitative relation between the network structure (e.g. depth/width) and the modeling ability for the contextual dependency. Especially, increasing the depth of the neural network can be more effective to improve the ability of modeling contextual dependency. Therefore, it is important to design an adaptive network to compute the adaptive depth in a task. Inspired by Adaptive Computation Time (ACT), we design an adaptive recurrent network based on the separation rank to model contextual dependency. Experiments on various NLP tasks have verified the proposed theoretical analysis. We also test our adaptive recurrent neural network in the sentence classification task, and the experiments show that it can achieve better results than the traditional bidirectional LSTM.
Keywords: Language Model, Recurrent Neural Network, Separation Rank
Original Pdf: pdf
7 Replies
Loading