Learning Disentangled Representation of Web Address via Convolutional-Recurrent Triplet Network for Classifying Phishing URLs

Abstract: Deep learning models for phishing URL classification, based on the convolutional-recurrent neural network for modeling the character-level and word-level features, have achieved good performance in terms of accuracy. However, there have been issues in the sampling stage due to the class imbalance of the URL phishing data and problems in constructing feature spaces. Therefore, this study aimed to address the class imbalance issues in the URL domain, in terms of the deep learning-based URL feature space generation, and to propose a modified triplet network structure that learns the similarity between URLs. The proposed method was verified using 60,000 URL datasets collected from real-world web addresses, and it achieved a performance improvement compared to the latest deep learning based methods. The modified triplet network was evaluated by a 10-fold cross-validation per time resolution, and it demonstrated a 45 percent improvement in recall to confirm the validity of the metric learning approach in the field of phishing URL classification.
0 Replies
Loading