When Memory Becomes a Vulnerability: Towards Multi-turn Jailbreak Attacks against Text-to-Image Generation Systems

Shiqian Zhao, Jiayang Liu, Yiming Li, Runyi Hu, Xiaojun Jia, Wenshu Fan, Xiaobao Wu, Xinfeng Li, Jie Zhang, Wei Dong, Tianwei Zhang, Anh Tuan Luu

Published: 04 Jan 2026, Last Modified: 25 Mar 2026USENIX Security 2026EveryoneCC BY 4.0

Abstract: Modern text-to-image (T2I) generation systems (\textit{e.g.}, DALL$\cdot$E 3) exploit the \textit{memory mechanism}, which captures key information in multi-turn interactions for faithful generation. Despite its practicality, the security analyses of this mechanism have fallen far behind. In this paper, we reveal that it can exacerbate the risk of jailbreak attacks. Previous attacks fuse the unsafe target prompt into \textit{one} ultimate adversarial prompt, which can be easily detected or lead to the generation of non-unsafe images due to under- or over-detoxification. In contrast, we propose embedding the malice at the inception of the chat session in memory, addressing the above limitations. Specifically, we propose \texttt{Inception}, the first \textit{multi-turn} jailbreak attack against \textit{real-world} text-to-image generation systems that explicitly exploits their memory mechanisms. \texttt{Inception} is composed of two key modules: \textit{segmentation} and \textit{recursion}. We introduce Segmentation, a \textit{semantic-preserving} method that generates multi-round prompts. By leveraging NLP analysis techniques, we design policies to decompose a prompt, together with its malicious intent, according to sentence structure, thereby evading safety filters. Recursion further addresses the challenge posed by unsafe sub-prompts that cannot be separated through simple segmentation. It firstly expands the sub-prompt, then invokes segmentation recursively. To facilitate multi-turn adversarial prompts crafting, we build \texttt{VisionFlow}, an emulation T2I system that integrates two-stage safety filters and industrial-grade memory mechanisms. The experiment results show that \texttt{Inception} successfully allures unsafe image generation, surpassing the SOTA by a 20.0\% margin in attack success rate. We also conduct experiments on the real-world commercial T2I generation platforms, further validating the threats of \texttt{Inception} in practice.