Abstract: The robustness of AI-content detection models against sophisticated adversarial strategies, such as paraphrasing or word switching, is a rising concern in natural language generation (NLG) applications.
This study proposes a novel token-ensemble generation strategy to challenge the robustness of current AI-content detection approaches by utilizing multiple sets of candidate generative large language models (LLMs).
By randomly sampling token(s) from candidate language model sets, we find the token-ensemble approach significantly drops the performance of AI-content detection models.
We evaluate the text quality produced under different token-ensemble settings based on annotations from hired human experts.
We proposed a fine-tuned Llama2 model to distinguish the token-ensemble-generated text more accurately.
Our findings underscore our proposed text generation approach's great potential in deceiving and improving detection models. This study's datasets, codes, and annotations are open-sourced.
Paper Type: Long
Research Area: Generation
Research Area Keywords: human evaluation, efficient models, few-shot generation, analysis, domain adaptation, text-to-text generation
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 5415
Loading