Length Generalization of Causal Transformers without Position EncodingDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Generalizing to longer sentences is important for recent Transformer-based language models. Besides algorithms manipulating explicit position features, the success of Transformers without position encodings (NoPE) provides a new way to overcome the challenge. In this paper, we study the length generalization property of NoPE. We find that NoPE can extend to longer sequences than the commonly used explicit position encodings. Moreover, we propose a parameter-efficient tuning for searching attention heads' best temperature hyper-parameters, which further expands NoPE's context size. Experiments on long sequence language modeling and the synthetic passkey retrieval task show that NoPE can achieve competitive performances with state-of-the-art length generalization algorithms.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
Contribution Types: Model analysis & interpretability
Languages Studied: English
0 Replies

Loading