Persona‑Conditioned Adversarial Prompting (PCAP): Multi‑Identity Red‑Teaming for Enhanced Adversarial Prompt Discovery
Keywords: GenAI, Red Teaming, Security
TL;DR: Red-Teaming method to cover real-world scenarios and discover vulnerabilities
Abstract: Existing automated red‑teaming pipelines often miss attacks that depend on attacker identity, framing, or multi‑turn tactics. This under-coverage underestimates real‑world risk. We introduce Persona‑Conditioned Adversarial Prompting (PCAP), which conditions adversarial search on attacker personas and strategy cards and runs parallel persona‑conditioned beam searches to discover diverse, transferable jailbreaks. PCAP is orthogonal to the underlying search algorithm and substantially increases attack success rate (ASR) and prompt diversity (e.g., ASR on GPT‑OSS~120B from $\approx58$\% $\rightarrow$ $\approx97$\%), improving attack strategy coverage and diversity.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 226
Loading