# Entity tracking MI

# NOTE FOR ICML REVIEWER
Appologies for the messy repo. We will release the code upon publication, so it will be cleaned. 
The datasets are too big for direct submission, so we include the
[link](https://drive.google.com/drive/folders/1UY6odU1hj-j7raBUwlOK2O4IYQS0R2NW?usp=share_link) to a google drive:



# Path Patching w/ Put

The directory is `nnsight_patching_experiment` (implemented with `nnsight`). There is an old directory `pp_experiment` 
where I also implemented the code in `transformer_lens`.

## To Find circuit

for `gemma-2-2b`:
```commandline
./scripts/run_nnsight_patching_noop_gemma2b.qsub  # on no-op data
./scripts/run_nnsight_patching_1put_gemma2b.qsub  # on 1put data, metric using both description and put object
./scripts/run_nnsight_patching_1put_lastObjOnly_gemma2b.qsub  # on 1put data, metric using put object
./scripts/run_nnsight_patching_1put_notLastObj_gemma2b.qsub  # on 1put data, metric using description object
```

```commandline
./scripts/run_nnsight_patching_noop_llama70b.qsub  # on no-op data
./scripts/run_nnsight_patching_1put_llama70b.qsub  # on 1put data, metric using both description and put object
./scripts/run_nnsight_patching_1put_lastObjOnly_llama70b.qsub  # on 1put data, metric using put object
./scripts/run_nnsight_patching_1put_notLastObj_llama70b.qsub  # on 1put data, metric using description object
```

These runs will output sorted score of all heads for that group, the plots and attention patterns of the heads in an 
output directory. See `entity-tracking-gemma/outputs/nnsight_patch_noop/gemma-2-2b/logp/n200`
for an example output.

## To Compute Evaluate Circuit (Faithfulness)

for `gemma-2-2b`, these scripts are:
```commandline
./scripts/run_circuit_eval_noop.qsub  # for no-op
./scripts/run_circuit_eval_1put.qsub  # for 1put
```

for `llama70b`, just find the corresponding scripts with suffix `llama70b`.


## Cross-patching circuit

We want to see if groups are functionally similar across PUT/DESCRIPTION circuits, so for each group, we can evaluate 
circuit performance if we use all description circuit groups except one group from put.

that code is in 
```commandline
./script/run_circuit_cross_eval_1put_llama70b.qsub
```

# Counterfactual activation patching (DCM) w/ subspace

### Scripts to run
```commandline
./scripts/run_hypothesis_patching_dcm_1put_1put_irrelevant_codellama13b.qsub  # full activation patching
./scripts/run_hypothesis_subspace_patching_dcm_1put_1put_irrelevant_gemma2b.qsub  # subspace patching
./scripts/run_hypothesis_subspace_patching_dcm_1put_1put_irrelevant_subsetTarget_gemma2b.qsub  # subspace patching, optimizing for 1 object at a time.
```

### To plot results
For the full activation patching, the plotting code is in the python script `nnsight_patching_experiment/run_activation_patching_with_hypothesis.py`

For subspace patching, the circuit faithfulness plot is in the run script `nnsight_patching_experiment/run_subspace_patching.py`. 
The plotting code for subspace overlap is this:
```commandline
cd nnsight_patching_experiment
python plot_subsapce_results.py
```

# Behavioral: Logit of Removed Object
Both behavioral experiments uses `entity-tracking-gemma/behavioral_experiments.py` 

The script is 
```commandline
./scripts/run_behavioral_global_vs_local_remove.qsub
```
To plot, the script is

```commandline
cd behavioral_experiments
python plot_behavioral_experiments.py
```

# Behavioral: Putting Globally-Removed Object

An example looks like this
```
The apple is in Box 0, the peach is in Box 1. ... Remove the peach in Box 1. Put the peach in Box 0. Box 0 contains
```
The dataset is `boxes_altAlways_1put_1remove_1fixObj`.
### Script
The script to run this is this
```commandline
./script/run_behavioral_put_globally_removed_obj.qsub
```
which uses the behavioral scripts `behavioral_experiments/run_behavioral_testing`.

Plotting the results would be:
```commandline
cd behavioral_experiments
python plot_put_globally_removed.py
```

