Counting in small transformers: The delicate interplay between attention and feed-forward layers

Freya Behrens; Luca Biggio; Lenka Zdeborova

Counting in small transformers: The delicate interplay between attention and feed-forward layers

Freya Behrens, Luca Biggio, Lenka Zdeborova

24 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: attention, mechanistic interpretability, architecture, toy model, counting, activation function

TL;DR: We analyse how a variety of small transformers fail or solve a counting task to showcase the different role architectural components play.

Abstract: How do different architectural design choices influence the space of solutions that a transformer can implement and learn? How do different components interact with each other to shape the model's hypothesis space? We investigate these questions by characterizing the solutions simple transformer blocks can implement when challenged to solve the histogram task -- counting the occurrences of each item in an input sequence from a fixed vocabulary. Despite its apparent simplicity, this task exhibits a rich phenomenology: our analysis reveals a strong inter-dependence between the model's predictive performance and the vocabulary and embedding sizes, the token-mixing mechanism and the capacity of the feed-forward block. In this work, we characterize two different counting strategies that small transformers can implement theoretically: relation-based and inventory-based counting, the latter being less efficient in computation and memory. The emergence of either strategy is heavily influenced by subtle synergies among hyperparameters and components, and depends on seemingly minor architectural tweaks like the inclusion of softmax in the attention mechanism. By introspecting models \textit{trained} on the histogram task, we verify the formation of both mechanisms in practice. Our findings highlight that even in simple settings, slight variations in model design can cause significant changes to the solutions a transformer learns.

Primary Area: interpretability and explainable AI

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3458

Loading