Setting the Record Straight on Transformer Oversmoothing

Published: 05 Mar 2024, Last Modified: 08 May 2024ICLR 2024 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: transformers, filtering
TL;DR: We prove that transformers are not always low-pass filters, and show that there are situations where they would benefit from low-pass filtering.
Abstract: Recent work has argued that Transformers are inherently low-pass filters that gradually oversmooth the inputs, limiting generalization, especially as model depth increases. How can Transformers achieve these successes given this shortcoming? In this work we show that in fact Transformers are not inherently low-pass filters. Instead, whether Transformers oversmooth or not depends on the eigenspectrum of their update equations. Further, depending on the task, smoothing does not harm generalization as model depth increases.
Submission Number: 72
Loading