Adaptive Adversarial Evaluation of Agentic Email Graders

Published: 23 May 2026, Last Modified: 23 May 2026ACM CAIS 2026: RLEval Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: phishing detection, agentic AI, adversarial evaluation, red teaming, email security, multi-agent systems
Abstract: Agentic security systems increasingly reason over heterogeneous evidence, invoke tools, and act under adversarial pressure. We present an adaptive adversarial evaluation framework for email classification, consisting of a multi-agent grader, an adversarial email generation pipeline, and an evaluator-driven feedback loop. The grader, PhishGuard-Eval, classifies emails as phishing, spam, or valid by coordinating header, body, and URL agents. The adversarial generator produces complete emails with realistic senders, authentication headers, subjects, and bodies, then adapts across multiple rounds using evaluator feedback. We evaluate the system using PhishFuzzer, a 23,100-email adversarial corpus with 3,300 real seeds and 19,800 synthetic variants. PhishGuard-Eval reaches 93.3 % accuracy and 0.933 macro F1 with Gemini 3.1 Pro, while a Qwen 2.5 72B-backed configuration reaches 70.0 % accuracy and 0.704 macro F1. However, the adaptive attacker still achieves a 76.9 % bypass rate across 52 attacks with an average of 3.6 attempts. These findings show that high held-out classification performance does not imply robustness against adaptive adversarial generation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 3
Loading