Train Short, Test Long: Attention with Linear Biases Enables Input Length ExtrapolationDownload PDF

29 Sept 2021, 00:30 (edited 12 Mar 2022)ICLR 2022 PosterReaders: Everyone
Abstract:
One-sentence Summary:
Supplementary Material: zip
14 Replies

Loading