Abstract: Long-term dependencies are difficult to learn using Recurrent Neural Networks due to the vanishing and exploding gradient problems, since their hidden transform operation is applied linearly in sequence length. We introduce a new layer type (the Tree Memory Unit), whose weight application scales logarithmically in the sequence length. We evaluate this on two pathologically hard memory benchmarks and two datasets. On those three tasks which require long-term dependencies, it strongly outperforms Long Short-Term Memory baselines. However, it does show weaker performance on sequences with few long-term dependencies. We believe that our approach can lead to more efficient sequence learning if used on sequences with long-term dependencies.
0 Replies
Loading