Keywords: llms, ai text detection, interpretability, zero-shot
TL;DR: DivEye detects AI-generated text by measuring surprisal-based diversity, exploiting the greater variability and irregularity of human writing compared to machine-generated text.
Abstract: Detecting AI-generated text is increasingly important to prevent misuse in education, journalism, and social media, where synthetic fluency can obscure misinformation. Existing detectors often rely on likelihood heuristics or black-box classifiers, which struggle with high-quality outputs and lack interpretability. We propose *DivEye*, a novel detection framework that leverages surprisal-based features to capture fluctuations in lexical and structural unpredictability, a signal more prominent in human-authored text. *DivEye* outperforms existing zero-shot detectors by up to 33.2%, matches fine-tuned baselines, and boosts existing detectors by up to 18.7% when used as an auxiliary signal. *DivEye* is also robust to paraphrasing and adversarial attacks, generalizes across domains, and offers interpretable insights into rhythmic unpredictability as a key indicator of AI-generated text.
Submission Number: 13
Loading