Abstract: In this study, we introduce a new dataset specifically designed for detecting messenger phishing, an increasingly significant issue in cybercrime. To overcome the scarcity of labeled phishing data, we employ large language models (LLMs) to generate synthetic data, thereby expanding the dataset and improving detection capabilities. Our experimental results show that a model trained exclusively on synthetic data performs comparably to those trained with labeled data. Furthermore, combining synthetic data with labeled data achieves superior F1 and accuracy scores compared to using labeled data only while reducing misclassification errors.
External IDs:dblp:conf/elinfocom/NohOKAJ25
Loading