PSC: Extending Context Window of Large Language Models via Phase Shift Calibration

ACL ARR 2024 June Submission1197 Authors

14 Jun 2024 (modified: 07 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Rotary Position Embedding (RoPE) is an efficient position encoding approach and is widely utilized in numerous large language models (LLMs). Recently, a lot of methods have been put forward to further expand the context window based on RoPE. The core concept of those methods is to predefine or search for a set of factors to rescale the base frequencies of RoPE. Nevertheless, it is quite a challenge for existing methods to predefine an optimal factor due to the exponential search space. In view of this, we introduce PSC (Phase Shift Calibration), a small module for calibrating the frequencies predefined by existing methods. With the employment of PSC, we demonstrate that many existing methods can be further enhanced, like PI, YaRN, and LongRoPE. We carry out extensive experiments on many models in various tasks and the results verify the effectiveness of our approach.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: fine-tuning, parameter-efficient-training
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 1197
Loading