Keywords: Training dynamics, Transformer, Mechanistic Interpretability, Visualization
TL;DR: Study the training dynamics in neural networks using a visualization sandbox that can study each layer of the model.
Abstract: This paper introduces a visual sandbox designed to explore the training dynamics of a small-scale transformer model, with the embedding dimension constrained to $d=2$
This restriction allows for a comprehensive two-dimensional visualization of each layer's dynamics.
Through this approach, we gain insights into training dynamics, circuit transferability, and the causes of loss spikes, including those induced by the high curvature of normalization layers.
We propose strategies to mitigate these spikes, demonstrating how good visualization facilitates the design of innovative ideas of practical interest.
Additionally, we believe our sandbox could assist theoreticians in assessing essential training dynamics mechanisms and integrating them into future theories.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2631
Loading