Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Abhimanyu Hans; Avi Schwarzschild; Valeriia Cherepanova; Hamid Kazemi; Aniruddha Saha; Micah Goldblum; Jonas Geiping; Tom Goldstein

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: LLM, Detection, Language Modelling, AI Detection

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: ChatGPT (and the like) output text with a surprisingly detectable signature -- we use this for a training-free and accurate detector.

Abstract: Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using pre-trained LLMs. The method, called *Binoculars*, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate *Binoculars* on a number of text sources and in varied situations. On news documents *Binoculars* detect 95\% of synthetic samples at a false positive rate of 0.01%, given 512 tokens of text from either humans or ChatGPT, matching highly competitive commercial detectors tuned specifically to detect ChatGPT.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8248

Loading