Abstract: Recurrent neural networks are convenient and efficient models for learning patterns in
sequential data. However, when applied to signals with very low cardinality such as
character-level language modeling, they suffer from several problems. In order to success-
fully model longer-term dependencies, the hidden layer needs to be large, which in turn
implies high computational cost. Moreover, the accuracy of these models is significantly
lower than that of baseline word-level models. We propose two structural modifications
of the classic RNN LM architecture. The first one consists on conditioning the RNN both
on the character-level and word-level information. The other one uses the recent history
to condition the computation of the output probability. We evaluate the performance of
the two proposed modifications on multi-lingual data. The experiments show that both
modifications can improve upon the basic RNN architecture, which is even more visible
in cases when the input and output signals are represented by single bits. These findings
suggest that more research needs to be done to develop general RNN architecture that
would perform optimally across wide range of tasks.
Conflicts: inria.fr, ens.fr, fb.com
2 Replies
Loading