Keywords: Large Language Models, Style-Agnostic, Content-driven
TL;DR: we proposed a robust framework that prioritizes content-driven features and employs a style-agnostic training paradigm.
Abstract: With the rising prominence and fluency of large language models (LLMs), developing technologies to identify LLMs-generated text has become increasingly critical. However, existing technologies depend on static linguistic features, which can be evaded as advanced models increasingly mimic a wide range of writing styles. The study reveals two crucial vulnerabilities in existing detection systems:(1) State-of-the-art detectors suffer a substantial accuracy decline, reaching up to 11.43% when exposed to style-based adversarial rewrites generated by LLMs. (2) While general-purpose LLMs exhibit remarkable zero-shot capabilities, their performance in detecting adversarially manipulated text is significantly lower than specialized detectors fine-tuned for robustness. To address these vulnerabilities, we propose a novel style-agnostic detection framework named SAFD that enhances detection accuracy and robustness by prioritizing content-driven features over stylistic attributes. Our approach integrates a style-invariant training paradigm to disentangle content semantics from stylistic variations. We leverage adversarially enriched datasets constructed using LLMs fine-tuned for diverse style-based rewrites. Furthermore, we utilize advanced representation learning techniques to extract content-centric features, emphasizing semantic coherence, logical consistency, and factual alignment. Experimental results across multiple datasets and detection models validate the effectiveness of our framework, showing significant improvements in detection accuracy and robustness against diverse adversarial manipulations. The dataset and code are in the link https://anonymous.4open.science/status/A-Style-Agnostic-Framework-for-Detecting-LLM-Generated-Text-90B7.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 16866
Loading