AI Text Detectors as a House of Cards:From Vulnerability Induced by Decoding Strategies to Robustness Through Restoration

ACL ARR 2026 January Submission4053 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI-generated Text Detection, Decoding Strategies, Robustness, Distribution Shift, Large Language Models
Abstract: With the rapid advancement of Large Language Models (LLMs), their generated text has become increasingly fluent and human-like, making it harder to distinguish machine-generated content from human-written text. Although many existing detectors report over 90\% AUC under their experimental settings, their robustness remains questionable. Prior work on robustness has primarily focused on sentence-level perturbations and rewriting, overlooking a crucial factor—distribution shifts introduced by token-level decoding strategies. We find that even minor changes to decoding parameters or strategies can drastically reduce the AUC of strong detectors to around 50\%, revealing a severe lack of robustness to decoding-induced variations. To systematically analyze this issue, we study how different token-level decoding strategies affect textual features and the internal state distributions of detectors. Based on these insights, we propose a restore transformation that restores the internal-state distributions induced by diverse decoding strategies to the original distribution, thereby improving detector robustness.We opened source our code in https://anonymous.4open.science/r/decoding_text_detection-5F61
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: fact checking,robustness,analysis
Contribution Types: Model analysis & interpretability
Languages Studied: english
Submission Number: 4053
Loading