SEAttack: A self-evolving jailbreak attack to induce toxic responses for non-toxic queries in large language models

Published: 2026, Last Modified: 04 Feb 2026Inf. Process. Manag. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading