# References

## Key Methodological References

Gao, L., Tow, J., Biderman, S., Black, S., DiPofi, A., Foster, C., Golding, L., Hsu, J., McDonell, K., Muennighoff, N., Phang, J., Reynolds, L., Tang, E., Thite, A., Wang, B., Wang, K., & Zou, A. (2021). A framework for few-shot language model evaluation. Zenodo. https://doi.org/10.5281/zenodo.5371629

Hamaker, E. L., Kuiper, R. M., & Grasman, R. P. P. P. (2015). A critique of the cross-lagged panel model. Psychological Methods, 20(1), 102-116. https://doi.org/10.1037/a0038889

Wilson, E. B. (1927). Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22(158), 209-212. https://doi.org/10.1080/01621459.1927.10502953

## Models and Tools

Biderman, S., et al. (2023). Pythia: A suite for analyzing large language models across training and scaling. Proceedings of ICML 2023. https://github.com/EleutherAI/pythia

Groeneveld, D., et al. (2024). OLMo: Accelerating the science of language models. arXiv:2402.00838. https://arxiv.org/abs/2402.00838

Nanda, N., & Bloom, J. (2022). TransformerLens. https://github.com/neelnanda-io/TransformerLens

## Related Work Citations

Burns, C., et al. (2022). Discovering latent knowledge in language models without supervision. arXiv:2212.03827.

Clark, K., et al. (2019). What does BERT look at? An analysis of BERT's attention. Proceedings of BlackboxNLP@ACL 2019.

Cunningham, H., et al. (2023). Sparse autoencoders find highly interpretable features in language models. arXiv:2309.08600.

Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. ICLR 2019.

Haviv, A., et al. (2023). Transformer language models without positional encodings still learn positional information. EACL 2023.

Hewitt, J., & Manning, C. D. (2019). A structural probe for finding syntax in word representations. NAACL-HLT 2019.

Huttenlocher, P. R. (2002). Neural plasticity: The effects of environment on the development of the cerebral cortex. Harvard University Press.

Meng, K., et al. (2022). Locating and editing factual associations in GPT. NeurIPS 2022.

Michel, P., et al. (2019). Are sixteen heads really better than one? NeurIPS 2019.

Olah, C., et al. (2020). Zoom in: An introduction to circuits. Distill.

Olsson, C., et al. (2022). In-context learning and induction heads. Transformer Circuits Thread.

Petroni, F., et al. (2019). Language models as knowledge bases? EMNLP-IJCNLP 2019.

Power, A., et al. (2022). Grokking: Generalization beyond overfitting on small algorithmic datasets. arXiv:2201.02177.

Swayamdipta, S., et al. (2020). Dataset cartography: Mapping and diagnosing datasets with training dynamics. EMNLP 2020.

Templeton, A., et al. (2024). Scaling monosemanticity: Extracting interpretable features from Claude 3 Sonnet. Transformer Circuits Thread.

Tenney, I., et al. (2019). What do you learn from context? Probing for sentence structure in contextualized word representations. ICLR 2019.

Voita, E., et al. (2019). Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. ACL 2019.

Wang, K., et al. (2023). Interpretability in the wild: A circuit for indirect object identification in GPT-2 small. ICLR 2023.

Wei, J., et al. (2022). Emergent abilities of large language models. TMLR 2022.

Xu, R., et al. (2025). Dynamics of continual pretraining. [Forthcoming]

Zhang, Y., et al. (2025). Attention entropy as a diagnostic tool for parallel context encoding. [Forthcoming]

## Accessibility Standards

W3C. (2018). Web Content Accessibility Guidelines (WCAG) 2.1. https://www.w3.org/WAI/WCAG21/quickref/

Trewin, S., et al. (2019). Accessibility of AI-infused systems. ACM SIGACCESS.

Gleason, P., et al. (2020). AI-generated content and accessibility. [Forthcoming]
