Abstract: Uncovering the inner mechanisms of Transformer models offers insights into how they process and represent information. In this work, we analyze the computation performed by Transformers in the layers after the top-1 prediction remains fixed, known as the “saturation event”. We expand this concept to top-k tokens, demonstrating that similar saturation events occur across language, vision, and speech models. We find that these events occur in order of the corresponding tokens’ ranking, i.e., the model first decides on the top ranking token, then the second highest ranking token, and so on. This phenomenon seems intrinsic to the Transformer architecture, occurring across different variants, and even in untrained Transformers. We propose that these events reflect task transitions, where determining each token corresponds to a discrete task. We show that it is possible to predict the current task from hidden layer embedding, and demonstrate that we can cause the model to switch to the next task via intervention. Leveraging our findings, we introduce a token-level early-exit strategy, surpassing existing methods in balancing performance and efficiency and show how to exploit saturation events for better language modeling.
Lay Summary: Transformer models are the backbone of powerful AI systems that understand language, images, and speech. But how are their outputs generated internally? In our research, we zoomed in on what happens inside a Transformer after it has settled on its most likely prediction. We discovered a surprising pattern: even when a model seems “done,” it actually continues processing, locking in its second-best guess, then third-best, and so on, *in order* of how likely each option is.
This pattern appears consistently across different types of models — including those for vision and speech — and even in Transformers that haven’t been trained yet. We believe these moments signal task shifts, where the model transitions from working on one guess to focusing on the next. Building on this idea, we designed a new technique that lets models stop early when they've confidently made a decision — saving time and computation without sacrificing accuracy.
Primary Area: Deep Learning->Large Language Models
Keywords: mechanistic interpretability, transformer, large language model, efficient inference
Flagged For Ethics Review: true
Submission Number: 12238
Loading