Improving Length Generalization via Position Index WarpingDownload PDF


16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: Length generalization mitigates the impact of mismatched conditions in training and testing where models are trained on short sequences but evaluated on longer ones. Among many factors that may impact length generalization in Transformer-based models, positional encoding has been identified as a critical one, but in-depth analysis on its impact on the length generalization issue is still limited. In this work, we advance our understanding via analyzing positional biases introduced by different positional encoding approaches. Our analysis suggests a novel approach to improve length generalization. The method warps positional indices during training, which can be considered as a data augmentation technique. Empirical studies on various tasks (e.g., algorithmic reasoning tasks and language modeling) showcase the effectiveness of our proposed method.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
0 Replies
