MSDOS: Multi-Style Distillation Observers for Robust AI-Generated Text Detection

MSDOS: Multi-Style Distillation Observers for Robust AI-Generated Text Detection

ACL ARR 2026 January Submission3263 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI Detection, Zero-Shot, NLP, Distillation, Robustness, Multi-Style

Abstract: Recent advances in large language models have amplified concerns about detecting AI-generated text in real-world settings. Most existing zero-shot detectors, however, implicitly assume a single known generator and a fixed decoding distribution, which often fails under unseen models, mixed sources, and decoding-induced shifts. We present MSDOS (Multi-Style Distillation Observers), a robust zero-shot detector that constructs an AI-side reference distribution from multiple distilled style observers, enabling reliable detection under unknown and mixed generators. MSDOS distills diverse generation behaviors into a set of style observers implemented as lightweight adapters on a shared backbone language model, and aggregates them via likelihood-based fusion by adaptively weighting observers according to their probabilistic consistency with the input. To further handle decoding variations, we introduce a repetition-penalty compensation mechanism that mitigates distribution shifts caused by repetition-penalized generation. Extensive experiments show that MSDOS consistently outperforms prior zero-shot detectors across unseen generators, domains, and decoding settings. Code and data are available at https://github.com/anonymacl/MSDOS.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: distillation, robustness, AI-Generated Text detection

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 3263

Loading