SAFD: A Style-Agnostic Framework for Detecting LLM-Generated Text

SAFD: A Style-Agnostic Framework for Detecting LLM-Generated Text

ICLR 2026 Conference Submission16866 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Style-Agnostic, Content-driven

TL;DR: we proposed a robust framework that prioritizes content-driven features and employs a style-agnostic training paradigm.

Abstract: With the rising prominence and fluency of large language models (LLMs), developing technologies to identify LLMs-generated text has become increasingly critical. However, existing technologies depend on static linguistic features, which can be evaded as advanced models increasingly mimic a wide range of writing styles. The study reveals two crucial vulnerabilities in existing detection systems:(1) State-of-the-art detectors suffer a substantial accuracy decline, reaching up to 11.43% when exposed to style-based adversarial rewrites generated by LLMs. (2) While general-purpose LLMs exhibit remarkable zero-shot capabilities, their performance in detecting adversarially manipulated text is significantly lower than specialized detectors fine-tuned for robustness. To address these vulnerabilities, we propose a novel style-agnostic detection framework named SAFD that enhances detection accuracy and robustness by prioritizing content-driven features over stylistic attributes. Our approach integrates a style-invariant training paradigm to disentangle content semantics from stylistic variations. We leverage adversarially enriched datasets constructed using LLMs fine-tuned for diverse style-based rewrites. Furthermore, we utilize advanced representation learning techniques to extract content-centric features, emphasizing semantic coherence, logical consistency, and factual alignment. Experimental results across multiple datasets and detection models validate the effectiveness of our framework, showing significant improvements in detection accuracy and robustness against diverse adversarial manipulations. The dataset and code are in the link https://anonymous.4open.science/status/A-Style-Agnostic-Framework-for-Detecting-LLM-Generated-Text-90B7.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 16866

Loading