Keywords: mechanistic interpretability, transformer, large language model, efficient inference
Abstract: Understanding the inner workings of Transformers is crucial for achieving more
accurate and efficient predictions. In this work, we analyze the computation performed
by Transformers in the layers after the top-1 prediction has become fixed, which has been
previously referred to as the “saturation event”. We expand the concept of saturation events
for top-k tokens, demonstrating that similar saturation events occur across language, vision,
and speech models. We find that these saturation events happen in order of the
corresponding tokens’ ranking, i.e., the model first decides on the top ranking token, then
the second highest ranking token, and so on. This phenomenon seems intrinsic to the
Transformer architecture, occurring across different architectural variants (decoder-only,
encoder-only, and to a lesser extent full-Transformer), and even in untrained Transformers.
We propose an underlying mechanism of task transition for this sequential saturation, where
task k corresponds to predicting the k-th most probable token, and the saturation events are
in fact discrete transitions between the tasks. In support of this we show that it is possible to
predict the current task from hidden layer embedding. Furthermore, using an intervention
method we demonstrate that we can cause the model to switch from one task to the next.
Finally, leveraging our findings, we introduce a novel token-level early-exit strategy, which
surpasses existing methods in balancing performance and efficiency.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8119
Loading