# Gallery

Research done involving TransformerLens:

- [Progress Measures for Grokking via Mechanistic
  Interpretability](https://arxiv.org/abs/2301.05217) (ICLR Spotlight, 2023) by Neel Nanda, Lawrence
  Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt
- [Finding Neurons in a Haystack: Case Studies with Sparse
  Probing](https://arxiv.org/abs/2305.01610) by Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine
  Harvey, Dmitrii Troitskii, Dimitris Bertsimas
- [Towards Automated Circuit Discovery for Mechanistic
  Interpretability](https://arxiv.org/abs/2304.14997) by Arthur Conmy, Augustine N. Mavor-Parker,
  Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso
- [Actually, Othello-GPT Has A Linear Emergent World Representation](https://neelnanda.io/othello)
  by Neel Nanda
- [A circuit for Python docstrings in a 4-layer attention-only
  transformer](https://www.alignmentforum.org/posts/u6KXXmKFbXfWzoAXn/a-circuit-for-python-docstrings-in-a-4-layer-attention-only)
  by Stefan Heimersheim and Jett Janiak
- [A Toy Model of Universality](https://arxiv.org/abs/2302.03025) (ICML, 2023) by Bilal Chughtai,
  Lawrence Chan, Neel Nanda
- [N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language
  Models](https://openreview.net/forum?id=ZB6bK6MTYq) (2023, ICLR Workshop RTML) by Alex Foote, Neel
  Nanda, Esben Kran, Ioannis Konstas, Fazl Barez
- [Eliciting Latent Predictions from Transformers with the Tuned
  Lens](https://arxiv.org/abs/2303.08112) by Nora Belrose, Zach Furman, Logan Smith, Danny Halawi,
  Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

User contributed examples of the library being used in action:

- [Induction Heads Phase Change
  Replication](https://colab.research.google.com/github/ckkissane/induction-heads-transformer-lens/blob/main/Induction_Heads_Phase_Change.ipynb):
  A partial replication of [In-Context Learning and Induction
  Heads](https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html)
  from Connor Kissane
- [Decision Transformer
  Interpretability](https://github.com/jbloomAus/DecisionTransformerInterpretability): A set of
  scripts for training decision transformers which uses transformer lens to view intermediate
  activations, perform attribution and ablations. A write up of the initial work can be found
  [here](https://www.lesswrong.com/posts/bBuBDJBYHt39Q5zZy/decision-transformer-interpretability).