RAPIDS: Resume Attack Prompt Injection Detection at Scale

Published: 18 Apr 2026, Last Modified: 24 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial Attacks, Large Language Models, Prompt Injection, Resume Screening, Safety
TL;DR: We presented RAPIDS, a cascade framework for detecting indirect prompt injection attacks in resumes at production scale.
Abstract: The integration of Large Language Models (LLMs) into recruitment workflows has introduced a critical security vulnerability: indirect prompt injection attacks embedded within resumes can manipulate screening tools to override instructions, effectively jailbreaking the hiring process. Frontier LLMs can detect such anomalies, but deploying them at the scale required for high-volume recruitment is prohibitively slow and costly. At the same time, existing generic prompt injection detectors lack the domain specificity needed for nuanced resume attacks. To address this gap, we introduce RAPIDS, a scalable detection framework with three contributions. First, we release a synthetically generated dataset of injection snippets derived from curated attack seeds spanning multiple adversarial strategies to address data scarcity in this domain. Second, we fine-tune a lightweight Small Language Model (SLM) on this data that outperforms the best off-the-shelf detector by over 50% in relative F1 and approaches frontier LLM accuracy. Third, we propose a cascade architecture in which the fine-tuned SLM serves as a high-recall first stage followed by an LLM verifier. This design achieves ${\geq}$ 98% end-to-end recall on both evaluated datasets while delivering a $21-24\times$ latency reduction over standalone frontier LLMs (GPT-5-mini), bringing expected per-request latency to $115-171$ ms at roughly 3.5% of the API cost.
Submission Type: Emerging
Copyright Form: pdf
Submission Number: 440
Loading