Keywords: Machine-generated Text, Large Language Model, Detection Method
TL;DR: Machine-Generated Text Detection Requires Fewer Machine-Human Mixed Texts
Abstract: Machine-generated texts (MGTs) of large language models (LLMs) show significant potential in many fields but also pose challenges like fake news propagation and phishing, highlighting the need for MGT detection. Most paragraph-level detection methods implicitly assume that MGTs are entirely machine-generated and ignore the scenarios where only part of the MGT is machine-generated or inconsistent with human-generated text. To this end, this paper first reveals the prevalence of implicit human-machine mixed texts, which contain subtexts that are common to human texts, and then theoretically analyzes their impact on detection. Based on our theoretical findings, we develop a stacked detection enhancement framework decoupled from the detection model, which involves revisiting the detection optimization objective and the balance between feasibility and efficiency during optimization. Extensive experiments demonstrate its superior improvements over existing detectors. Notably, our boosting strategy can also work in a training-free manner, offering flexibility and scalability. The source code is available at \url{https://anonymous.4open.science/r/MGTD}.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 16526
Loading