Keywords: AI text detection, contrasive learning, LLMs
Abstract: Humanizing methods of AI-generated texts are emerging, which leads to severe performance degradation of current AI text detectors. Most existing detectors are struggling to ensure consistent performance in the huge span from detecting simple AI texts to detecting AI texts humanized in various ways. In addition, the monolithic threshold-based scoring mechanism they rely on is vulnerable, and some humanized AI texts can escape sanction after a single detection. Targetedly, we propose HiDet, which contains coarse module and subdivision module to give AI texts a double check. We decouple the complete detection process into simple sample detection and difficult sample detection, the coarse module of HiDet filters out simple samples, while hard-to-detect samples like humanized AI texts will be carefully discriminated through the subdivision module which applies a multi-grained contrastive learning strategy. This hierarchical framework makes up for the loophole that humanized AI texts can successfully escape the traditional detector after a single detection, and shows excellent robustness in the task of detecting humanized AI texts. Meanwhile, our framework is flexiable, the subdivision module can be deployed separately on the existing detector as a plug-and-play patch to tremendously improve their performance when facing large-scale humanized AI texts. We hope our work can inspire new sparks in the field of AI-generated text detection, codes and datasets will be open soon.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 15999
Loading