## Cleaning Poisoned Training Data with Samples of Poisoned and Clean Examples

### Dataset
**CIFAR10**, **TinyImageNet** are open datasets. You can download datasets here[https://www.cs.toronto.edu/~kriz/cifar.html] or use PyTorch package.

### To generate poisoned dataset

First, the poisons with different perturbation constraint for each attack need to be generated and used as value for "poisons_path" using the benchmark repository: 

Poisoning-Benchmark[https://github.com/aks2203/poisoning-benchmark/tree/master]
```
git clone https://github.com/JonasGeiping/poisoning-gradient-matching.git
```

For transfer learning setting, the pretrained model should be used in order to run the experiments. 

Pretrained model can be acquired from the EPIC paper's author upon request.

Seeds (--seed) used for defending against generated poisons with indexes from 0 to 19 are as follows:

```
#2000000000 for generated poisons with index 0
#2100000000 for generated poisons with index 1
#2110000000 for generated poisons with index 2
#2111000000 for generated poisons with index 3
#2111100000 for generated poisons with index 4
#2111110000 for generated poisons with index 5
#2111111000 for generated poisons with index 6
#2111111100 for generated poisons with index 7
#2111111110 for generated poisons with index 8
#2111111111 for generated poisons with index 9
#3000000000 for generated poisons with index 10
#3100000000 for generated poisons with index 11
#3110000000 for generated poisons with index 12
#3111000000 for generated poisons with index 13
#3111100000 for generated poisons with index 14
#3111110000 for generated poisons with index 15
#3111111000 for generated poisons with index 16
#3111111100 for generated poisons with index 17
#3111111110 for generated poisons with index 18
#3111111111 for generated poisons with index 19
```

Seeds were selected based on the following repository:

Data-Poisoning[https://github.com/JonasGeiping/data-poisoning.git]
```
git clone https://github.com/JonasGeiping/poisoning-gradient-matching.git
```


### How to run defenses
**For running the extension of EPIC,**
be sure to download the following repository.

EPIC defense method[https://github.com/YuYang0901/EPIC]
```
git clone https://github.com/JonasGeiping/poisoning-gradient-matching.git
```

Poisoned Dataset can be generated following the description above, or it can be downloaded from ''Poisoning-Benchmark'' repository.

(1) For using ResNet model and CIFAR10

For running the experiment, please use the following as an example for defending against Bullseye Polytope attack (BP):
```
python3  train_poison_extension_epic.py --arch resnet18 --tradeoff_output './BP_Mod_EPIC_8/BP_Mod_EPIC_8_eps8' --out ./BP_Mod_EPIC_8 --poisons_path ./poisoning-benchmark/poison_examples/bp_poisons/0  --seed 2000000000 --subset_size 0.1 --subset_freq 1 --scenario transfer --output_name 0
```

(2) For using VGG16 model on Tinyimage and training from scratch setting

Poisons for this attack can be downloaded from [https://drive.google.com/drive/folders/1o8AbvrHMGOTLI8L4mQ6f8Kk_jO781PNl] which belongs to the benchmark repository.
```
python3  train_poison_extension_epic.py --arch vgg16 --dataset tinyimagenet --tradeoff_output './BP_Mod_EPIC_8/BP_Mod_EPIC_8_eps8' --out ./BP_Mod_EPIC_8 --poisons_path ./poisoning-benchmark/poison_examples/bp_poisons/0  --seed 2000000000 --subset_size 0.1 --subset_freq 1 --scenario scratch --output_name 0
```

**For running Meta-Sift,**

Meta-Sift artifact release (USENIX Security 2023) includes:
- `main.py`
- `quick_start.ipynb`
- `metasift.yml`

```bash
git clone https://github.com/raman-lab/Meta-SIFT.git
cd Meta-Sift
```


**For launching our approach: Influence-Guided Active Search**

Influence-Guided Active Search frames poisoned data recovery as a forensic investigation problem.  
The workflow consists of two main steps: ``extracting feature vectors'' and ``running influence-guided search''.

(1) Generate poisons
- Follow the instructions in the Poisoning-Benchmark section to generate or download poisoned datasets.  
- Use the same dataset paths for consistency.  

(2) Extract feature vectors
Run the following script to extract feature vectors that will be used to calculate influence scores:
```bash
python src/extract_vectors.py \
  --dataset {dataset} \
  --poison_path data/{attack_type}_poisons/{index} \
  --output_dir src/outputs/{output_folder_name}
```

(3) Run influence-guided active search
Use the extracted vectors to compute influence scores and implement forensic investigation:

```bash
python src/run.py \
  --attack_types {attack_type} \
  --poison_path data/{attack_type}_poisons/{index} \
  --budget {budget} \
  --distance_path src/outputs/{index}_{attack_type}/feats_distances.pkl
```