D&R: Recovery-based AI-Generated Text Detection via a Single Black-box LLM Call

Published: 26 Jan 2026, Last Modified: 26 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI-generated Text Detection, Large Language Models, Training-free Methods, Black-box Detection, Recovery-based Detection, Robustness
Abstract: Large language models (LLMs) generate increasingly human-like text, raising concerns about misinformation and authenticity. Detecting AI-generated text remains challenging: existing methods often underperform, especially on short texts, require probability access unavailable in real-world black-box settings, incur high costs from multiple calls, or fail to generalize across models. We propose Disrupt-and-Recover (D\&R), a recovery-based detection framework grounded in posterior concentration. D\&R disrupts text via model-free Within-Chunk Shuffling, performs a single black-box LLM recovery, and measures semantic–structural recovery similarity as a proxy for concentration. This design ensures efficiency, black-box practicality, and is theoretically supported under the concentration assumption. Extensive experiments across four datasets and six source models show that D\&R achieves state-of-the-art performance, with AUROC 0.96 on long texts and 0.87 on short texts, surpassing the strongest baseline by +0.08 and +0.14. D\&R further remains robust under source–recovery mismatch and model variation. Our code and data is available at https://github.com/Yuxia-Sun/D-R.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 13074
Loading