Keywords: Large Language Models, Perplexity, Long Text Processing
Abstract: Recent studies have shown that Large Language Models (LLMs) have the potential to process extremely long text with evidence that LLMs could perform well in the language modeling task with even 1 million input tokens. When the input context length increases, the perplexity (PPL) of the model is observed to maintain at a low level or even decrease. However, in our study, we find that the PPL may only reflect the model's ability to model local information instead of catching long-range dependency, and thus only using PPL to prove the model could process very long context is not appropriate. The local focus feature of perplexity could also explain some existing phenomena, such as the great extrapolation ability of the position method ALiBi. When evaluating a model's ability in long text, we might pay more attention to the limitation of PPL and avoid overly reliance on it.
Submission Number: 185
Loading