Keywords: AI-generated text detection, prediction discrepancy modeling, fine-tune
Abstract: Recent advances in large language models (LLMs) have enabled them to generate text with increasingly human-like linguistic styles, posing significant challenges for AI-generated text detection (AGTD). Mainstream zero-shot AGTD methods primarily compute token-level AI-likeness scores from a machine-centric perspective represented by proxy models, and treat all tokens equally in the overall detection score calculation. However, these methods overlook predictive discrepancies between humans and LLMs in interpreting the same text. Our key intuition is that tokens exhibiting greater divergence in human and machine predictions offer stronger cues for authorship attribution. To address this limitation, we propose \textbf{HAPDA}, a \underline{h}uman-m\underline{a}chine \underline{p}redictive \underline{d}iscrepancy \underline{a}dapter for the AGTD task. HAPDA consists of (i) a joint fine-tuning strategy for training paired human and machine preference models, and (ii) a discrepancy-aware reweighting mechanism to calibrate token-level detection scores in downstream detectors. Extensive experiments across multiple datasets demonstrate that HAPDA consistently and significantly improves the performance of five representative baselines under diverse evaluation settings.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7630
Loading