Enhancing Capabilities of Llama in Long Context with Dynamic Drop AttentionDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: Large Language Models (LLMs) exhibit constrained extrapolation capabilities, particularly when confronted with input text that exceeds the model training window. This phenomenon manifests as a discernible degradation in performance, attributable to two principal factors. Firstly, the modification in positional encoding, induced by variations in text length, exerts a discernible impact on attention calculations, thereby giving rise to substantive deviations. Secondly, inherent limitations within the attention mechanism engender attention dispersion as the length of the input text increases. In this paper, we investigate the phenomenon of attention dispersion and propose a straightforward yet effective approach, namely Dynamic Drop Attention (DDA). DDA filters noise and retains important information to mitigate attention dispersion during attention computation. DDA significantly enhances the text generation capability of LLMs without fine-tuning. To evaluate the effectiveness of the DDA, we implement it on the open-source Llama2 model and perform experiments on the LongQA and QMSum datasets. Compared to the vanilla Llama2, the DDA-based model achieves an improvement in perplexity for language modeling. Additionally, manual evaluations attest to improvements in the conciseness, relevance, and accuracy of the generated text.
Paper Type: long
Research Area: Generation
Languages Studied: english
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview