SHARP: Cascaded Regex-LLM Architecture for Phishing Detection

SHARP: Cascaded Regex-LLM Architecture for Phishing Detection

16 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, AI, Phishing Detection

Abstract: Phishing attacks have evolved into sophisticated threats causing over \$17 billion in annual losses, demanding innovative detection approaches that balance accuracy, efficiency, and interpretability. We present SHARP (Synergistic Hybrid Architecture for Robust Phishing-detection), a novel cascaded system that intelligently combines large language model (LLM) semantic analysis with optimized regex pattern matching to achieve state-of-the-art phishing detection. Unlike existing methods that treat traditional and AI approaches as alternatives, SHARP leverages their complementary strengths through a three-tier decision cascade: (1) high-confidence regex filtering for obvious cases (65\% of emails, <10ms), (2) LLM-powered semantic analysis for ambiguous content (30\% of emails, ~1s), and (3) adaptive threshold optimization that learns from both components. Through extensive evaluation on 1,002 real-world phishing and legitimate emails, SHARP achieves an F1-score of 0.957, surpassing CNN-BiGRU (0.915), Feature Ensemble (0.934), and PhishIntention (0.890). Critically, SHARP maintains 95.2\% accuracy while processing emails 7× faster than feature ensemble methods (3.2s vs 23.8s average). Our ablation studies reveal that the synergistic combination provides a 4.1\% F1-score improvement over LLM-only and 30.8\% over regex-only approaches. We demonstrate that hybrid architectures represent a paradigm shift in phishing detection, offering deployment-ready solutions that excel in both research benchmarks and production environments.

Supplementary Material: zip

Submission Number: 292

Loading