GPT-who: An Information Density-based Machine-Generated Text Detector

Anonymous

GPT-who: An Information Density-based Machine-Generated Text Detector

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: A statistical-based, psycholinguistically motivated high performing machine text detector

Abstract: The Uniform Information Density principle posits that humans prefer to spread information evenly during language production. We examine if this UID principle can help capture differences between Large Language Models (LLMs) and human-generated text. We propose GPT-who, the first psycholinguistically-aware multi-class domain-agnostic statistical-based detector. This detector employs UID-based features to model the unique statistical signature of each LLM and human author for accurate authorship attribution. We evaluate our method using 4 large-scale benchmark datasets and find that {\gptwho} outperforms state-of-the-art detectors (both statistical- & non-statistical-based) such as GLTR, GPTZero, DetectGPT, OpenAI detector, and ZeroGPT by over $20$\% across domains. In addition to superior performance, it is computationally inexpensive and utilizes an interpretable representation of text articles. We find that GPT-who can distinguish texts generated by very sophisticated LLMs, even when the overlying text is indiscernible. UID-based measures for all datasets and code are available at \url{https://anonymous.4open.science/r/gpt-who-03F8/}.

Paper Type: long

Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: English

0 Replies

Loading