IntentBreaker: Intent-Adaptive Jailbreak Attack on Large Language Models

Published: 2025, Last Modified: 12 Jan 2026ECML/PKDD (4) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent research on jailbreak attacks has uncovered substantial robustness vulnerabilities in existing large language models (LLMs), enabling attackers to bypass safety guardrails through carefully crafted malicious prompts. Such prompts can induce the generation of harmful content, posing significant safety and ethical concerns. In this paper, we reveal that the difficulty of successfully jailbreaking LLMs varies considerably depending on the intent of the attacker, which inherently limits the overall attack success rate (ASR). Current approaches mostly rely on generic jailbreak templates and optimization strategies, and this lack of adaptability limits their effectiveness and efficiency across diverse jailbreak intents.
Loading