## [NeurIPS 2022: Oral] Handcrafted Backdoors in Deep Neural Networks

This repository contains the code for reproducing the results in our paper:

- [Handcrafted Backdoors in Deep Neural Networks](https://arxiv.org/abs/2106.04690) **[NeurIPS 2022: Oral]**
- **[Sanghyun Hong](https://sanghyun-hong.com)**, Nicholas Carlini, Alexey Kurakin.



----

### TL; DR

We show that the backdoor attacker, originally presented as a *supply-chain adversary*, can handcraft model parameters to inject backdoors into deep neural networks.

### Abstract

When machine learning training is outsourced to third parties, *backdoor attacks* become practical as the third party who trains the model may act maliciously to inject hidden behaviors into the otherwise accurate model. Until now, the mechanism to inject backdoors has been limited to *poisoning*. We argue that a supply-chain attacker has more attack techniques available by introducing a *handcrafted* attack that directly manipulates a model's weights. This direct modification gives our attacker more degrees of freedom compared to poisoning, and we show it can be used to evade many backdoor detection or removal defenses effectively. Across four datasets and four network architectures our backdoor attacks maintain an attack success rate above 96%. Our results suggest that further research is needed for understanding the complete space of supply-chain backdoor attacks.

&nbsp;

----

### Contents

1. [Pre-requisites](#pre-requisites)
2. [Training clean models](#training)
3. [Backdooring pre-trained models](#backdooring)
4. [Evading existing defenses](#evasion)
5. [Resilience to parameter perturbations](#resilience)
6. [Avoid unintended behaviors](#unintended-behaviors)

----

### Pre-requisites

**Note:** Python version > 3.8 is required to run our scripts.
You can install the required Python pacakges using the following commands:

```
    $ pip install -r requirements.txt
```

To run our scripts on the PubFig dataset, you are required to download the datasets and the models from this link ([dataset](https://drive.google.com/file/d/1fuHnwMnKhbSaBaIehnG6CQ-W65_Gw_2x/view?usp=sharing) and [models](https://drive.google.com/file/d/18M2XpyUwJrpVMkjVS-Ty5iekihjYq4Vk/view?usp=sharing)). You can decompress the downloaded files using the following commands:

```
    $ mkdir -p datasets/pubfig
    $ mv pubfig_dataset.h5 datasets/pubfig/
    $ tar -zxvf pubfig_models.tar.gz models/
```

**Note:** To download the SVHN dataset, refer to the author's website ([link](http://ufldl.stanford.edu/housenumbers/)). Download two files: `train_32x32.mat`, `test_32x32.mat` and place them under `datasets/svhn/`

----

### Training

To train the clean (un-compromised) models, run this command:

```
    $ python train.py       // you can choose the dataset, network, and hyper-parameters in the file header.
```

To compute the classification accuracy on the test-set, run the following command:

```
    $ python valid.py       // this will by default, compute the attack success rate as well.
```


----

### Backdooring

To backdoor models through poisoning, you can run the following command:

```
    $ python run_standard_bdoor.py
    $ python run_standard_bdoor.py --multiple --multiidx <# for the current run>
      // this is for using bash script to run multiple times
      // you can choose the attack configurations in the file header.
```

To run our handcrafted backdoor attacks, run this command:

```
    $ python run_handcraft.bdoor.ff.py      // for fully-connected networks
    $ python run_handcraft.bdoor.cnn.py     // for convolutional neural networks
```

To run our advanced meet-in-the-middle (MITM) attacks, you can use the following commands:

```
    $ python run_mitm_ours.py               // this optimizes a trigger to maximize activation differences.
    $ python run_handcraft.bdoor.mitm.py    // this mount our attack on the fully-connected parts
```

**Note:** Oftentimes, we need to profile the activations of clean, pre-trained models to optimize our attack strategies (handcraft or MITM). To facilitate this process, you can run the following commands:

```
    $ python run_profile.ff.py
    $ python run_profile.cnn.py
```

Running those scripts will create a folder `profile` under the project home and store the profiling results, e.g., visualizations of activations computed from each layer and their distributional differences.

----

### Evasion

**[Evade Neural Cleanse]**

To draw the plot that shows the evasion of Neural Cleanse, run the following command (we manually run NC and hard-cord the results in the script):

```
    $ python run_nc_evasion.py
```

**[Evade fune-tuning]**

To run the script for testing the evasion of the fine-tuning defense:

```
    $ python run_defense_finetune.py
    $ python run_defense_finetune.mitm.py       // for the models compromised by MITM
```

**[Evade fune-pruning]**

To run the script for testing the evasion of the fine-pruning defense:

```
    $ python run_defense_finepruning.py
    $ python run_defense_finepruning.mitm.py    // for the models compromised by MITM
```


----

### Resilience

**[Statistical analysis of weights]**

To run the script for statistical testing of the model parameters:

```
    $ python run_defense_statistics.py
```


**[Resilience to weight clipping]**

To run the script for testing the resilience to weight clipping:

```
    $ python run_defense_clipping.py
    $ python run_defense_clipping.mitm.py       // for the models compromised by MITM
```

**[Resilience to parameter perturbations]**

To run the script for testing the resilience to parameter perturbations:

```
    $ python run_defense_perturb.py
    $ python run_defense_perturb.mitm.py       // for the models compromised by MITM
```


----

### Unintended-Behaviors

**[Avoid unintended behaviors]**

To reproduce our experiments, you are required to train denoiser models proposed by Sun et al.:

```
    $ python train_denoiser.py
    $ python valid_denoiser.py      // this script will validate the denoiser's performance on clean and adversarial examples
```

Now, we use the trained denoiser to reconstruct potential trigger patterns from a backdoored model and test the reconstructed triggers' effectiveness by computing the attack success rate. You can use the following script:

```
    $ python run_broken_reconst.py
```

**[Misclassification bias]**

To compare the misclassification bias, run the following command:

```
    $ python run_broken_mbias.py
```

**Note:** This script produces adversarial examples on a model and analyzes the classification behaviors of a model on those malicious samples.


**[Hessian-based analysis]**

To compute the Hessian values from a model, run this command:

```
    $ python run_hessian_torch.py
```

---

### Cite Our Work

Please cite our work if you find our work is helpful.

```
@inproceedings{Hong2022Handcrafted,
    title={{Handcrafted Backdoors in Deep Neural Networks}},
    author={Sanghyun Hong and Nicholas Carlini and Alexey Kurakin},
    booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
    year={2022},
    url={https://openreview.net/forum?id=6yuil2_tn9a}
}
```


----

### License

```
Copyright 2021 The Handcrafted Backdoors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```

**Done**
