Is Human-Written Text Liked by Humans? Multilingual Human Detection and Preference Against AI

ACL ARR 2025 February Submission4116 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Prior studies have shown that distinguishing text generated by large language models (LLMs) from human-written content is highly challenging, and often no better than random guessing. To verify the generalizability of this finding across non-English languages and diverse domains, we perform an extensive case study to identify the upper bound of human detection accuracy. Across 16 datasets covering nine languages and nine domains, 19 annotators achieved an average detection accuracy of 87.6\%, challenging previous conclusions. Major gaps between human and machine text lie in concreteness, cultural nuances, and diversity. Prompting by explicitly explaining the distinctions in the prompts can partially bridge gaps in over 50\% of the cases. However, we find that humans do not always prefer human-written text, particularly when they cannot clearly identify its source.
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: human evaluation, text-to-text generation, analysis; multilingualism;
Contribution Types: Reproduction study, Data resources, Data analysis
Languages Studied: Arabic, Chinese, English, Hindi, Italian, Japanese, Kazakh, Russian, Vietnamese
Submission Number: 4116
Loading