Alternative structures for character-level RNNs

Piotr Bojanowski, Armand Joulin, Tomas Mikolov

Feb 18, 2016 (modified: Feb 18, 2016) ICLR 2016 workshop submission readers: everyone
  • Abstract: Recurrent neural networks are convenient and efficient models for learning patterns in sequential data. However, when applied to signals with very low cardinality such as character-level language modeling, they suffer from several problems. In order to success- fully model longer-term dependencies, the hidden layer needs to be large, which in turn implies high computational cost. Moreover, the accuracy of these models is significantly lower than that of baseline word-level models. We propose two structural modifications of the classic RNN LM architecture. The first one consists on conditioning the RNN both on the character-level and word-level information. The other one uses the recent history to condition the computation of the output probability. We evaluate the performance of the two proposed modifications on multi-lingual data. The experiments show that both modifications can improve upon the basic RNN architecture, which is even more visible in cases when the input and output signals are represented by single bits. These findings suggest that more research needs to be done to develop general RNN architecture that would perform optimally across wide range of tasks.
  • Conflicts: inria.fr, ens.fr, fb.com

Loading