Abstract: Large Language Models (LLMs) exhibit constrained extrapolation capabilities, particularly when confronted with input text that exceeds the model training window. This phenomenon manifests as a discernible degradation in performance, attributable to two principal factors. Firstly, the modification in positional encoding, induced by variations in text length, exerts a discernible impact on attention calculations, thereby giving rise to substantive deviations. Secondly, inherent limitations within the attention mechanism engender attention dispersion as the length of the input text increases.
In this paper, we investigate the phenomenon of attention dispersion and propose a straightforward yet effective approach, namely Dynamic Drop Attention (DDA). DDA filters noise and retains important information to mitigate attention dispersion during attention computation. DDA significantly enhances the text generation capability of LLMs without fine-tuning. To evaluate the effectiveness of the DDA, we implement it on the open-source Llama2 model and perform experiments on the LongQA and QMSum datasets. Compared to the vanilla Llama2, the DDA-based model achieves an improvement in perplexity for language modeling. Additionally, manual evaluations attest to improvements in the conciseness, relevance, and accuracy of the generated text.
Paper Type: long
Research Area: Generation
Languages Studied: english
0 Replies
Loading