On the Confounding Effects of Length Generalization With Randomized Positional EncodingsDownload PDF

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone
Abstract: Transformers generalize exceptionally well on tasks with a fixed context length. However, this capability rapidly diminishes when test sequences are far longer than any sequence seen during training. Unfortunately, simply training on longer sequences is computationally infeasible due to the quadratic cost of attention. Randomized positional encodings were shown to alleviate this issue on algorithmic reasoning tasks, where position is of high importance, but it is unclear if their benefits also transfer to "real-world" tasks such as image classification or natural language processing, which may have different inductive biases. Therefore, in this work, we analyze these randomized encodings on such tasks. Moreover, we show that fine-tuning pretrained models with randomized positional encodings improves length generalization. Finally, we demonstrate that evaluating length generalization on natural language can be misleading due to its short-range dependencies, whereas algorithmic reasoning and vision reveal the limits of prior work and the effectiveness of randomized positional encodings.
Paper Type: short
Research Area: Machine Learning for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: natural language processing, image recognition, neural algorithmic reasoning
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies

Loading