### Installation

This repository requires [FFCV](https://github.com/libffcv/ffcv) library, and [PyTorch](https://pytorch.org/). You can also install the environment via 

```
conda env create -f environment.yml
``` 

### Counterfactual Estimation

To perform counterfactual estimation for a single test sample on CIFAR-10 run the following - 

```
python counterfactual_search.py --test-idx $test_idx \
                                --matrix-path $matrix_path \
                                --results-path $results_path \
                                --num-tests 5 \
                                --search-budget 7 \
                                --arch $arch
```

The arguments are defined as follows - 

```
--test-idx       Specifies the test index on which to perform counterfactual estimation
--matrix-path    Path to matrix containing top-k training indices for each validation sample
--results-path   Path where results for the index are dumped as a pickle file
--search-budget  Budget to use for bisection search
--arch           Model architecture to use {resnet-9, mobilenetv2}
--flip-class     Boolean argument, if specified performs mislabel instead of removal
```

The test_indices used throughout our paper are in `data/test_indices/test_100.npy`

We provide pre-computed Top 1280 training indexes for TRAK and MoCo. These are placed as follows -
```
data\topk_train_samples\
                        traker100models20480proj_1280.npy   TRAK (100)
                        Moco_800epoch_esvm_c01_1280.npy     MoCo ESVM                  
```
For Datamodels on CIFAR-10 you can download the weights from [here](https://github.com/MadryLab/datamodels-data). 

TRAK models were trained using code available [here](https://github.com/MadryLab/trak)

We will release the top-k training indices for all approaches we used in the paper. These could not be uploaded 
due to the supplementary size limit.