Keywords: Recurrent Neural Network, Gate mechanism
Abstract: Linear Recurrent Neural Networks (RNNs) have attracted attention for their memory and computational efficiency.
In particular, gated linear RNNs enable nonlinear transformations through gating mechanisms while still maintaining linear time complexity by removing hidden states from them.
However, the impact of the gate mechanisms and such removal of hidden states from them remains unexplored.
Here we empirically investigate the impact of these gating mechanisms and find that gate values near zero or one highly depend on hidden states, leading to unintended distribution shifts of gate values when hidden states are removed in gated linear RNNs.
Based on our findings, we propose an algorithm to mitigate the distribution shifts, which empirically improves performance on long-sequence modeling tasks.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 16349
Loading