# SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders

This repository contains the implementation used for the experiments and evaluations in the paper *SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders*.

## Repository Structure

- **cross_sections/**: Contains cross-section data obtained for the models during evaluations.
- **feature_dicts/**: Stores the supervised feature dictionaries derived for various tasks.
- **utils/**: Utility functions and scripts used across multiple evaluation notebooks.

## Notebooks

- **evaluate_gemma.ipynb**: Script to evaluate sparse autoencoders (SAEs) on the induction task using the Gemma-2-2B model.
- **evaluate_ind.ipynb**: Evaluation of SAEs on the induction task with general feature distributions.
- **evaluate_ind_animals.ipynb**: Evaluation of SAEs on the induction task using animal-related feature distributions.
- **evaluate_ind_countries.ipynb**: Evaluation of SAEs on the induction task using country-related feature distributions.
- **evaluate_ind_numbers.ipynb**: Evaluation of SAEs on the induction task using numerical feature distributions.
- **evaluate_ind_pythia.ipynb**: Evaluation of SAEs on the induction task using the Pythia70M model.
- **evaluate_ioi.ipynb**: Evaluation of SAEs on the Indirect Object Identification (IOI) task.
- **train_supervised.ipynb**: Training supervised feature dictionaries for various tasks.

## How to Run

1. Clone the repository and navigate to the folder:
   ```bash
   git clone <repo_url>
   cd <repo_folder>

2. Train a supervised feature dictionary with `train_supervised.ipynb` on the desired model and task.

3. Open the desired evaluation notebook (e.g., `evaluate_ioi.ipynb`) and run the cells to replicate the evaluations described in the paper.

## Contact

For any questions or issues related to the code, please contact the corresponding author via email: [my-email@abc.com].