Sequential Compliance Prompting: A Psychologically-Inspired Jailbreak via Multi-Choice Escalation

Sequential Compliance Prompting: A Psychologically-Inspired Jailbreak via Multi-Choice Escalation

ACL ARR 2026 January Submission9166 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Safety, Jailbreak Attacks, Refusal Bypass, Prompt-based Alignment, Adversarial Prompting, Content Moderation, Model Compliance, Safety Evaluation

Abstract: Despite advances in alignment (e.g., RLHF), large language models remain vulnerable to black-box jailbreaks. Many existing attacks rely on prompt obfuscation or iterative search, which can be costly and conspicuous. We propose Sequential-Compliance Prompting (SCP), a three-phase jailbreak framework built around multiple-choice interactions: it first elicits harmless cooperation, then induces an explicit choice of output schema, and finally escalates by appealing to that self-selected commitment. SCP keeps the original toxic request verbatim and exploits answer-schema obedience without per-instance optimization, suffix search, or gradient access. On HarmBench, SCP achieves a 98.3% attack success rate on GPT-4o, outperforming prior black-box baselines under our evaluation protocol. These results identify MCQ-style forced-choice prompting as an underexplored attack surface and motivate defenses that account for structural, not just lexical, manipulation.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: safety and alignment; prompting; robustness

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 9166

Loading