Abstract: Developing robust detection models against phishing emails has long been the main concern of the cyber defense community. Currently, public phishing/legitimate datasets lack adversarial email examples which keeps the detection models vulnerable. To address this problem, we developed an augmented phishing/legitimate email dataset, utilizing different adversarial text attack techniques. Next, the models were retrained with the adversarial dataset. Results showed that accuracy and F1 score of the models improved under subsequent attacks. In another experiment, synthetic phishing emails were generated using a fine-tuned GPT-2 model. The detection model was retrained with a newly formed synthetic dataset. Subsequently, we observed that the accuracy and robustness of the model did not improve significantly under black box attack methods. In the last experiment, we proposed a defensive technique to classify adversarial examples to their true labels using a K-Nearest Neighbor approach with 94% accuracy in our prediction.
Loading