Transformers need glasses! Information over-squashing in language tasks

Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João Guilherme Madeira Araújo, Oleksandr Vitvitskyi, Razvan Pascanu, Petar Velickovic

Published: 2024, Last Modified: 16 Apr 2026NeurIPS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading