Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets

ACL ARR 2024 June Submission4522 Authors

16 Jun 2024 (modified: 07 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) can now generate and recognize text in a wide range of styles and genres, including highly specialized, creative genres like poetry. Poetry is a lightning rod for the marketing and popular imagination of LLM capabilities because it is a signifier of human creativity and complexity, as well as a popular and culturally significant art form. But what do LLMs really know about poetry? What can they know about poetry? We develop a task to evaluate how well LLMs recognize one aspect of English-language poetry, poetic form, which captures many different poetic features, including rhyme scheme, meter, and word or line repetition. We use this task to reflect on LLMs' current poetic capabilities, as well as the challenges and pitfalls of creating NLP benchmarks for poetry and for other creative tasks. In particular, we use this task to audit and reflect on the poems included in popular pretraining datasets. Our findings have implications for NLP researchers interested in model evaluation, digital humanities and cultural analytics research, and cultural heritage collections.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: language/cultural bias analysis, corpus creation, benchmarking
Contribution Types: Data resources, Data analysis
Languages Studied: English
Submission Number: 4522