Abstract: This study reports an unintuitive finding that positional encoding enhances learning of recurrent neural networks (RNNs). Positional encoding is a high-dimensional representation of time indices on input data. Most famously, positional encoding complements the capabilities of Transformer neural networks, which lack an inherent mechanism for representing the data order. By contrast, RNNs can encode the temporal information of data points on their own, rendering their use of positional encoding seemingly redundant/unnecessary. Nonetheless, investigations through synthetic benchmarks reveal an advantage of coupling positional encoding and RNNs, especially for handling a large vocabulary that yields low-frequency tokens. Further scrutinization unveils that these low-frequency tokens destabilizes the gradients of vanilla RNNs, and the positional encoding resolves this instability. These results shed a new light on the utility of positional encoding beyond its canonical role as a timekeeper for Transformers.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Let me first express my gratitude to the four reviewers, whose insightful feedbacks significantly have significantly enhanced the quality of this work.
I also gratefully acknowledge the generous support provided by the action editor.
**Revisions**
- Deanonymized.
- Removed the `latexdiff` outputs (which was responsible for the text overflow).
- Fixed the typo pointed out by Reviewer 93n9.
- Included the link to the GitHub repo for the code (and removed the Supplementary Material that had provided the same contents).
- Included the Acknowledgments section.
Code: https://github.com/tkc-morita/position-encoded_rnn
Assigned Action Editor: ~Alessandro_Sperduti1
Submission Number: 3188
Loading