En-SeqGAN: An Efficient Sequence Generation Model for Deceiving URL Classifiers

Tuan Dung Pham, Thi Thanh Thuy Pham, Viet Cuong Ta

Published: 01 Jan 2022, Last Modified: 12 May 2025ACIIDS (Companion) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Generative Adversarial Networks (GANs) are recently used to generate URL patterns to fool the phishing URL classifiers. Some of these works use Wasserstein GAN (WGAN) to generate domain samples for deceiving phishing URL detectors. However, WGAN-based models are designed to work mainly on continuous data and cannot capture the diverse set of patterns in a URL sequence. In order to overcome this issue, we propose En-SeqGAN which works on discrete data to generate full URL sequences. The proposed model is based on the standard SeqGAN with the addition of entropy regularization to encourage the model to produce diverse URL samples. Several intensive experiments are done to prove that the URL samples generated by the proposed model can evade the gray-box phishing detectors of LSTM and Random Forest. The efficiency of gray-box attack by En-SeqGAN on these URL classifiers outperforms both methods of SeqGAN and WGAN. Moreover, En-SeqGAN can generate well-structured URL samples with various URL sequence lengths.