## Introduction

This codebase is for the ICML submission "Sparse Autoencoders Can Interpret Randomly Initialized Transformers" and is heavily based on the following repositories:
- [EleutherAI/sae](https://github.com/EleutherAI/sae/)
- [EleutherAI/sae-auto-interp](https://github.com/EleutherAI/sae-auto-interp)
- [adamkarvonen/SAEBench](https://github.com/adamkarvonen/SAEBench)


## Installation

Required packages:
```bash
pip install torch
pip install transformers
pip install datasets
pip install nnsight
pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn
pip install scikit-learn
pip install orjson
```

Optional dependencies:
```bash
pip install bitsandbytes  # For 8-bit model loading
```

## Code Organization

The codebase is organized into several components:

- `sae/`: Core sparse autoencoder implementation
- `sae-auto-interp/`: Automated interpretation tools and utilities
- `SAEBench/`: Evaluation and benchmarking tools
- `experiments/`: Scripts for running the training, evaluation and analysis experiments

All experiment scripts are prefixed with "run_" in the experiments folder, with different configurations for various model sizes and settings.
