Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: LLM, Detection, Language Modelling, AI Detection
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: ChatGPT (and the like) output text with a surprisingly detectable signature -- we use this for a training-free and accurate detector.
Abstract: Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using pre-trained LLMs. The method, called *Binoculars*, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate *Binoculars* on a number of text sources and in varied situations. On news documents *Binoculars* detect 95\% of synthetic samples at a false positive rate of 0.01%, given 512 tokens of text from either humans or ChatGPT, matching highly competitive commercial detectors tuned specifically to detect ChatGPT.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8248
Loading