From Attack Surfaces to Actual Operations: A Survey of Modern LLM Jailbreaks

From Attack Surfaces to Actual Operations: A Survey of Modern LLM Jailbreaks

ACL ARR 2026 January Submission6402 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Survey, Jailbreak Attacks

Abstract: Large language models (LLMs) face significant safety challenges from jailbreak attacks—techniques that manipulate prompts to bypass defenses and elicit harmful outputs. Existing taxonomies focus on manipulation methods rather than underlying mechanisms, limiting our understanding of attack effectiveness and defensive strategies. In this work, we survey existing LLM jailbreak attacks and organize them using a novel two-fold taxonomy. Our technical taxonomy categorizes attacks across three tiers based on exploited vulnerabilities and approaches. Our operational taxonomy evaluates attacks across four dimensions to assess real-world feasibility and sustainability. Through correlation analysis, we reveal relationships between LLM vulnerabilities and practical attack constraints. Applying our taxonomies to existing attacks identifies research gaps and provides insights for developing stronger offensive and defensive methods. Our work can contribute to systematic, risk-informed security improvements for LLMs, helping the research community move beyond reactive defenses.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Language Modeling

Contribution Types: Surveys

Languages Studied: English

Submission Number: 6402

Loading