Machine-Generated Text Detection Requires Fewer Machine-Human Mixed Texts

Machine-Generated Text Detection Requires Fewer Machine-Human Mixed Texts

ICLR 2026 Conference Submission16526 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine-generated Text, Large Language Model, Detection Method

TL;DR: Machine-Generated Text Detection Requires Fewer Machine-Human Mixed Texts

Abstract: Machine-generated texts (MGTs) of large language models (LLMs) show significant potential in many fields but also pose challenges like fake news propagation and phishing, highlighting the need for MGT detection. Most paragraph-level detection methods implicitly assume that MGTs are entirely machine-generated and ignore the scenarios where only part of the MGT is machine-generated or inconsistent with human-generated text. To this end, this paper first reveals the prevalence of implicit human-machine mixed texts, which contain subtexts that are common to human texts, and then theoretically analyzes their impact on detection. Based on our theoretical findings, we develop a stacked detection enhancement framework decoupled from the detection model, which involves revisiting the detection optimization objective and the balance between feasibility and efficiency during optimization. Extensive experiments demonstrate its superior improvements over existing detectors. Notably, our boosting strategy can also work in a training-free manner, offering flexibility and scalability. The source code is available at \url{https://anonymous.4open.science/r/MGTD}.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 16526

Loading