IntentObfuscator: A Jailbreaking Method via Confusing LLM with Prompts

Shang Shang; Zhongjiang Yao; Yepeng Yao; Liya Su; Zijing Fan; Xiaodan Zhang; Zhengwei Jiang

IntentObfuscator: A Jailbreaking Method via Confusing LLM with Prompts

Shang Shang, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

Published: 01 Jan 2024, Last Modified: 11 Apr 2025ESORICS (4) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the era of Large Language Models (LLMs), developers establish content review conditions to comply with legal, policy, and societal requirements, aiming to prevent the generation of sensitive or restricted content due to considerations like social security, privacy, and criminal justice. However, persistent attempts by attackers and security researchers to bypass content security measures have led to the emergence of various jailbreak technologies, including role-playing, adversarial suffixes, encryption, and more.

Loading