Robust Scam Detection via LLM-based Adversarial Training

Robust Scam Detection via LLM-based Adversarial Training

ACL ARR 2025 February Submission5490 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the rapid advancement of artificial intelligence technology, scams have become increasingly sophisticated and pose a growing threat to society, resulting in tremendous monetary losses. Detecting scams is a challenging task that remains under-explored due to the lack of large-scale real-world datasets. While recent advances in Large Language Models (LLMs) have made it feasible to generate synthetic data for model distillation, models trained on such data often struggle with real-world attacks. This limitation stems from synthetic data's insufficient diversity in covering various defrauding techniques, outdated knowledge in LLMs that may not reflect recent scam patterns, and potential biases that cause over-reliance on non-robust features rather than generalizing effectively to real-world scenarios. We propose ALERT (Adversarial LLM-based Enhanced Robust Training), a novel approach that leverages LLMs to generate diverse, bias-free adversarial samples, thereby enhancing the robustness of scam detection models. Our experimental results demonstrate that our model, trained exclusively on synthetic data, achieves high F1 scores when generalizing to unseen real-world data from Korea and China.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: adversarial training, red teaming, robustness, generalization, misinformation detection and analysis, model bias/fairness evaluation, security/privacy

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: English, Chinese, Korean

Submission Number: 5490

Loading