Abstract: The widespread adoption of large language models (LLMs) has increased the need for reliable AI-text detection. While current detectors perform well on benchmark datasets, we identify a critical vulnerability: increasing the temperature parameter during inference significantly reduces detection accuracy. Based on this weakness, we propose TempParaphraser, a simple yet effective paraphrasing framework that simulates high-temperature sampling effects through multiple normal-temperature generations, effectively evading detection. Experiments show that TempParaphraser reduces detector accuracy by an average of 97.3\% while preserving high text quality. We also demonstrate that training on TempParaphraser-augmented data improves detector robustness. All resources are publicly available to support future research.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: adversarial attacks/examples/training
Contribution Types: Publicly available software and/or pre-trained models, Theory
Languages Studied: English
Submission Number: 3860
Loading