Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text Detection

Token-Ensemble Text Generation: On Attacking the Automatic AI-Generated Text Detection

ACL ARR 2024 June Submission5415 Authors

16 Jun 2024 (modified: 22 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The robustness of AI-content detection models against sophisticated adversarial strategies, such as paraphrasing or word switching, is a rising concern in natural language generation (NLG) applications. This study proposes a novel token-ensemble generation strategy to challenge the robustness of current AI-content detection approaches by utilizing multiple sets of candidate generative large language models (LLMs). By randomly sampling token(s) from candidate language model sets, we find the token-ensemble approach significantly drops the performance of AI-content detection models. We evaluate the text quality produced under different token-ensemble settings based on annotations from hired human experts. We proposed a fine-tuned Llama2 model to distinguish the token-ensemble-generated text more accurately. Our findings underscore our proposed text generation approach's great potential in deceiving and improving detection models. This study's datasets, codes, and annotations are open-sourced.

Paper Type: Long

Research Area: Generation

Research Area Keywords: human evaluation, efficient models, few-shot generation, analysis, domain adaptation, text-to-text generation

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 5415

Loading