Keywords: AI Detection, Zero-Shot, NLP, Distillation, Robustness, Multi-Style
Abstract: Recent advances in large language models have amplified concerns about detecting AI-generated text in real-world settings.
Most existing zero-shot detectors, however, implicitly assume a single known generator and a fixed decoding distribution, which often fails under unseen models, mixed sources, and decoding-induced shifts.
We present MSDOS (Multi-Style Distillation Observers), a robust zero-shot detector that constructs an AI-side reference distribution from multiple distilled style observers, enabling reliable detection under unknown and mixed generators.
MSDOS distills diverse generation behaviors into a set of style observers implemented as lightweight adapters on a shared backbone language model, and aggregates them via likelihood-based fusion by adaptively weighting observers according to their probabilistic consistency with the input.
To further handle decoding variations, we introduce a repetition-penalty compensation mechanism that mitigates distribution shifts caused by repetition-penalized generation.
Extensive experiments show that MSDOS consistently outperforms prior zero-shot detectors across unseen generators, domains, and decoding settings.
Code and data are available at https://github.com/anonymacl/MSDOS.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: distillation, robustness, AI-Generated Text detection
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 3263
Loading