Humans Outperform AI at Detecting Machine-Generated Poetry

Humans Outperform AI at Detecting Machine-Generated Poetry

ACL ARR 2025 February Submission3956 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As the capabilities of artificial intelligence (AI) grow, language models can now produce 'very close’ human-like poetry. However, it remains unclear whether people can reliably detect which works were written by humans and which were generated by machines. We conduct a Turing-inspired experiment comparing human and AI detection capabilities using a dataset of 300 incomplete poems completed by GPT-4.o, Gemini 1.5, and Llama 3.2. Five human evaluators achieved 95.8% mean accuracy in distinguishing human vs AI continuations, while cross-model evaluations peaked at 55% accuracy. These findings highlight that, for now, human expertise remains important for creating and distinguishing poetry work.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: Poetry, Machine Learning, NLP, Detection, Turing

Contribution Types: Model analysis & interpretability, Theory

Languages Studied: English

Submission Number: 3956

Loading