White-Box Text Detectors Using Proprietary LLMs: A Probability Distribution Estimation Approach

ICLR 2025 Conference Submission2519 Authors

22 Sept 2024 (modified: 27 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine-Generated Text Detection
TL;DR: Enable existing white-box methods to proprietary models and achieve an average accuracy of about 0.96 across five latest source models
Abstract: Large language models (LLMs) can generate text almost indistinguishable from human-written one, highlighting the importance of machine-generated text detection. However, current zero-shot techniques face challenges as white-box methods are restricted to use weaker open-source LLMs, and black-box methods are limited by partial observations from stronger proprietary LLMs. It seems impossible to enable white-box methods to use proprietary models because the API-level access neither provides full predictive distributions nor inner embeddings. To break this deadlock, we propose Probability Distribution Estimation (PDE), estimating full distributions from partial observations. Despite the simplicity of PDE, we successfully extend white-box methods like Entropy, Rank, Log-Rank, and Fast-DetectGPT to latest proprietary models. Experiments show that PDE (Fast-DetectGPT, GPT-3.5) achieves an average accuracy of about 0.95 across five latest source models, improving the accuracy by 51% relative to the remaining space of the baseline (as Table 1). It demonstrates that the latest LLMs can effectively detect their own outputs, suggesting advanced LLMs may be the best shield against themselves. We release our codes and data at https://github.com/xxx/xxxxxx.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2519
Loading