Abstract: Phishing email attacks have evolved into a significant threat, causing substantial economic and political harm. However, existing detection methods often neglect the data heterogeneity resulting from diverse email sources and are trained on balanced email datasets, which do not accurately reflect real-world scenarios. Meanwhile, with increasing privacy protection regulations, it is crucial to develop methods that enhance phishing email detection capabilities while preserving user privacy. To address these challenges, we propose PhFL, a framework based on heterogeneous federated learning, for detecting phishing emails. PhFL decouples clients’ models into representation learning models and classifiers. The representation learning models can be tailored to clients’ specific needs, and the classifiers are globally shared and re-trained on the server, leveraging the class feature means generated by the representation learning models. Our framework allows each client to leverage its private data locally without providing emails to other clients or the server. The collaboration of class feature means and re-training of classifiers effectively address the challenges of class imbalance and data heterogeneity, enabling improved model performance. Experimental results demonstrate that PhFL outperforms other federated learning methods, particularly when different clients have email datasets from diverse sources and face imbalanced class distributions.
Loading