Abstract: Large language models suffer issues when operated on long contexts that is larger than their training context length due to the standard position encoding for tokens in the attention layer. Tokens a long distance apart will rarely have affect on each other and long prompts yield unexpected results. To solve this problem, we propose SELF (Self-Extend the Context Length With Logistic Growth Function): a solution of grouping consecutive tokens at varying group sizes using a logistic capacity equation combined with a constant group size at smaller relative distances. Our model had an average increase of performance compared to base models in LEval of 3.2\% and had an average increase of 9.1\% on the LongBench benchmark. On summarization related tasks in LongBench, our model performed 3.46\% better than the base model. On reading comprehension tasks from LEval, our model performed 11.76\% better than the base model. Our code is available at anonymous.4open.science/r/SELF-LLM-7705
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Context window length extension, Positional embedding, Long context
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 3999
Loading