RSTHFS: A Rough Set Theory-Based Hybrid Feature Selection Method for Phishing Website Classification

Jahanggir Hossain Setu, Nabarun Halder, Ashraful Islam, M. Ashraful Amin

Published: 01 Jan 2025, Last Modified: 09 Oct 2025IEEE Access 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Phishing is a pervasive form of cybercrime where malicious websites deceive users into revealing sensitive information, e.g., passwords and credit card details. Despite advances in cybersecurity, accurately detecting phishing websites remains challenging due to the absence of universally accepted identification parameters. This study introduces a novel feature selection method, Rough Set Theory-based Hybrid Feature Selection (RSTHFS), to enhance phishing website detection using Machine Learning (ML) techniques. Our approach was evaluated using three diverse datasets containing 2,456, 10,000, and 88,647 instances. The RSTHFS method demonstrated a significant improvement by maintaining an average accuracy rate of 95.48% while reducing the number of features by 69.11% on average. Performance was further assessed using three advanced classifiers: Light Gradient-Boosting Machine (LightGBM), Random Forest (RF), and Categorical Boosting (CatBoost), with CatBoost emerging as the most efficient, achieving the highest accuracy. Additionally, RSTHFS reduced the runtime by 61.43%, highlighting its efficiency. These findings indicate that RSTHFS is not only effective in identifying phishing websites but also accelerates ML processes, providing a reliable and swift approach to feature selection. This work contributes to the field by presenting a robust methodology that enhances the accuracy and speed of phishing detection systems.