In many of these code files, "elephants" is used to refer to "dense latents".

0_running_gemma.py contains code for running Gemma on the dataset and capturing SAE activations. It saves large pickles called "all_data" which contain all SAE activations. 0_running_gemma_projections.py is similar, except projecting the SAEs instead.

0_data_process.py takes "all_data" and processes it to find frequencies within this dataset, and activations of elephants.
0_interactive.py creates a visualization highlighting elephant activations.

globals.py contains some global functions.

sae_training, ablations.py and utils.py contain code for section 3 of the paper.

1_position_tracking.py, 2_context_binding_plot.py, 2_context_binding.py, 2_markov.py, 5_pos.py, 6_pca_otherSAEs.py, appendix_bias.py, visualizations.py contain code for section 4 of the paper. fig2_plotting_master.py contains the list of elephants and recreates Fig2.