TL;DR: Towards Persona-oriented LLM-generated Text Detection: Benchmark Dataset and Method
Abstract: The prevalence of generative artificial intelligence (AI) has brought attention to the challenge of distinguishing AI-generated texts from human-written ones. Particularly, Large Language Models (LLMs) have the ability to generate texts that mimic specific persona' tone and style, which raises concerns about the spread of fake opinions. However, there has been limited focus on detecting LLM-generated texts towards specific personas. To fill the gap, we propose a new task of persona-oriented LLM-generated text detection. We have created a benchmark dataset called CCD6, which includes LLM-generated texts from ChatGPT, ChatGLM and Divinci-003 across people from 6 domains. Additionally, we introduce a novel method called CHF, which utilizes constrastive learning with hybrid features, as a strong baseline for this task. Our experiments demonstrate the effectiveness of our proposed method, and we provide extensive analysis that suggests promising research directions for future studies. Warning: This paper contains potentially inaccurate and harmful texts.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
0 Replies
Loading