2020 (modified: 17 Nov 2022)ICML 2020Readers: Everyone
Abstract:Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input’s length, they are prohibitively slow for very long sequences. To addre...