# Intro

State-of-the-art machine learning models are prone to adversarial attacks: maliciously crafted inputs to fool the model into making a wrong prediction, often with high confidence. While defense strategies have been extensively explored in the computer vision domain, research in natural language processing still lacks techniques to make models resilient to adversarial text inputs. We propose an adversarial detector leveraging Shapley additive explanations against text attacks. Our approach outperforms the current state of the art by around 15% F1-score on the IMDB 94% and SST2 77% datasets while showing also competitive performance on AG_News and Yelp_Polarity. Furthermore, we prove the detector to only require a low amount of training samples and, in some cases, to generalize to different datasets without needing to retrain.

# Data
Precomputed SHAP signatures for all four datasets are provided on [Dropbox]( https://us02web.zoom.us/j/6276938547?pwd=TjJLM1pGQlZpK3hwakQrRTRCVVRMZz09). Download and place the content of the archive in the `data` folder.

# Required Python packages

In this project we used python 3.7. All the necessary dependencies are specified in the `env.yaml` file. To create a new conda environment simply run `conda env create -f env.yaml`.
Alternatively, all packages can also be installed manually:

`jupyter`
`shap=0.39.0`
`tensorflow-gpu=2.4.1`
`matplotlib`
`numpy=1.19.5`
`seaborn`
`datasets=1.4.0`
`textattack=0.2.15`

# Usage

To reproduce the results of the paper, simply run the `SHAP_Detector` notebook. It will automatically download all precomputed SHAP signatures for the IMDB, AG_News, Yelp_Polarity and SST2 dataset. To create new SHAP signatures, use one of the other notebooks.
