# Adversarial Attacks Leverage Interference between Features in Superposition

## Setup

```bash
uv sync
uv run python <path_to_file>
```

Files are numerically named to correspond with the order the experiments in each section of the paper occur:

## Section 3: Toy Models

* `sec01_ce_toy_model.py` - Standard MaxSum classification setup with i.i.d. data
* `sec02_correlated_data.py`- Train the MaxSum setup with correlated pairs and fully correlated data
* `sec03_limit_superposition.py` - Train and adversarially attack a model with an orthogonal latent feature
* `sec04_neighbour_attacks.py` - Run a targeted adversarial attack, plotting a success matrix of source/target class pairs

## Section 4: CIFAR 

* `sec01_train_cifar.py` - Train the base and bottleneck ViT models on CIFAR-10
* `sec02_evaluate_cifar_robustness.py` - Run adversarial attacks on ViT models
* `sec03_transferability.py` - Measure adversarial attack transferability between models

## Section 5: Modulo

* `sec01_train_modulo.py` - Train modulo model
* `sec02_analyse_modulo.py` - Measure explained variance and train probes.
* `sec03_attack_modulo.py` - Adversarially attack the modulo model.
* `sec04_verify_modulo.py` - Certified training and verification.
