One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs

One-Shot is Enough: Consolidating Multi-Turn Attacks into Efficient Single-Turn Prompts for LLMs

ACL ARR 2025 February Submission7908 Authors

16 Feb 2025 (modified: 11 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Abstract Despite extensive safety enhancements in large language models (LLMs), multi-turn “jail-break” conversations crafted by skilled human adversaries can still breach even the most so-phisticated guardrails. However, these multi-turn attacks demand considerable manual effort, limiting their scalability. In this work, we in-troduce a novel approach called Multi-turn-to-Single-turn (M2S) that systematically converts multi-turn jailbreak prompts into single-turn attacks. Specifically, we propose three con-version strategies—Hyphenize, Numberize, and Pythonize—each preserving sequential context yet packaging it in a single query. Our ex-periments on the Multi-turn Human Jailbreak (MHJ) dataset show that M2S often increases or maintains high Attack Success Rates (ASRs) compared to original multi-turn conversations. Notably, using a StrongREJECT-based evalua-tion of harmfulness, M2S achieves up to 95.9%ASR on Mistral-7B and outperforms original multi-turn prompts by as much as 17.5% in absolute improvement on GPT-4o. Further analysis reveals that certain adversarial tactics, when consolidated into a single prompt, exploit structural formatting cues to evade standard policy checks. These findings underscore that single-turn attacks—despite being simpler and cheaper to conduct—can be just as potent, if not more, than their multi-turn counterparts. Our findings underscore the urgent need to reevalu-ate and reinforce LLM safety strategies, given how adversarial queries can be compacted into a single prompt while still retaining sufficient complexity to bypass existing safety measures.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: LLM, jailbreak, multi-turn, single-turn, M2S, adversarial attacks, prompt conversion, safety evaluation, StrongREJECT, red-teaming, guardrail bypass, tactic analysis, code formatting

Contribution Types: NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Submission Number: 7908

Loading