Improving Transformers with Probabilistic Attention Keys

Tam Minh Nguyen, Tan Minh Nguyen, Dung D. D. Le, Duy Khuong Nguyen, Viet-Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. Osher

Published: 01 Jan 2022, Last Modified: 17 May 2023ICML 2022Readers: Everyone

Abstract: Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. I...

0 Replies