Improving Transformers with Probabilistic Attention KeysDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 17 May 2023ICML 2022Readers: Everyone
Abstract: Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. I...
0 Replies

Loading