Self-Disguise Attack: Induce the LLM to disguise itself for AIGT detection evasion

Yinghan Zhou; Juan Wen; Wanli Peng; Wu Zhengxian; ZiWei Zhang; Xue yiming

Self-Disguise Attack: Induce the LLM to disguise itself for AIGT detection evasion

Yinghan Zhou, Juan Wen, Wanli Peng, Wu Zhengxian, ZiWei Zhang, Xue yiming

18 Sept 2025 (modified: 15 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI generated text detection evasion, large language model

Abstract: AI-generated text (AIGT) detection evasion aims to reduce the detection probability of AIGT, helping to identify weaknesses in detectors and enhance their effectiveness and reliability in practical applications. However, existing evasion methods still require high training costs due to fine-tuning and result in text quality reduction owing to the text modification. To address these challenges, we propose Self-Disguise Attack (SDA), a novel approach that enables large language models (LLMs) to actively disguise their output, reducing the detection probability of AIGT. The SDA comprises two main components: the adversarial feature extractor and the retrieval-based context examples optimizer. The former generates disguise features that enable LLMs to understand how to produce more human-like text. The latter retrieves the most relevant examples from an external knowledge base as in-context examples, further enhancing the self-disguise ability of LLMs and mitigating the impact of the disguise process on the diversity of the generated text. The SDA directly employs prompts containing disguise features and optimized context examples to guide the LLM in generating detection-resistant text, thereby reducing resource consumption. Experimental results demonstrate that the SDA effectively reduces the average detection accuracy of various AIGT detectors across texts generated by three different LLMs, while maintaining the quality of AIGT.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 10152

Loading