Abstract: In recent years, large language models (LLM) have progressed rapidly, leading to growing concerns about the proliferation of difficult-to-distinguish AI-generated content. This has given rise to a range of issues, including fake news, academic fraud, phishing emails, posing significant dangers across various domains. However, current machine-generated text (MGT) detection methods still face challenges, including the need to access model's output logits or losses, which makes it unable to adapt to black-box scenarios in the real world, and difficult to deploy models with large parameter sizes. Therefore, we propose a compression-based lightweight network for MGT detection that leverages the ability of lossless compression to effectively extract features between categories. With fewer parameters, our framework achieves state-of-the-art performance in MGT detection under black box conditions. Experiments demonstrate that our approach performs exceptionally well on both Chinese and English datasets. Specifically, our method achieves a full-text detection accuracy of 99.5%, surpassing the previous SOTA method.
Loading