Stealthy Jailbreaking Attacks via Hyperbolic Hamiltonian Dynamics and Möbius Fusion

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Jailbreak Attacks, Large Language Models, Safety Alignment, Hyperbolic Hamiltonian Dynamics, Möbius-based Fusion
TL;DR: Decompose a harmful instruction into a sequence of semantically coherent, logically ordered, and individually harmless sub-questions, augmented with a shared narrative to enhance stealth
Abstract: Recent studies on jailbreaking attacks have shown the vulnerability of large language models (LLMs) to malicious questions. Existing jailbreaking attack methods often rely on disfluent or incoherent prompts, which limit their success and make them easy to detect. We introduce SJA, a structured jailbreak attack that overcomes these weaknesses through two key ideas. First, inspired by the logic of Spilsbury puzzle, SJA decomposes a harmful query into a sequence of harmless sub-questions and reconstructs the original answer by combining the sub-question responses. Second, by leveraging the theory of Hamiltonian dynamics on hyperbolic space, we propose a hyperbolic Hamiltonian dynamics-based sub-question generation framework that effectively captures the structural and temporal dependencies. We provide a theoretical analysis of how each sub-question evolves along the trajectory and show that the hyperbolic Hamiltonian system effectively captures the underlying semantic structure. Finally, we propose a hyperbolic narrative fusion mechanism built on fractional embedding and Möbius fusion. This mechanism integrates coherent narratives into sub-questions while preserving geometric consistency and improving stealth performance. We theoretically validate that the combination of the generated harmless sub-questions, guided by the stealthy narrative, can effectively preserve the contextual semantics of the original harmful question.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 6261
Loading