Beyond Memorization and Recitation: Evaluating LLMs on Deep Understanding of Ancient Chinese Poetry

Beyond Memorization and Recitation: Evaluating LLMs on Deep Understanding of Ancient Chinese Poetry

ACL ARR 2026 January Submission7268 Authors

06 Jan 2026 (modified: 07 Jun 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, evaluation, traditional Chinese culture

Abstract: Ancient Chinese Poetry (ACP) stands as a brilliant embodiment of cultural heritage, using concise forms to convey profound emotions. While Large Language Models (LLMs) have made rapid progress in mimicking linguistic styles and reciting verses, whether they truly understand the poets' underlying intent remains an open question. Current works primarily focus on knowledge-driven, surface-level understanding, failing to assess the understanding gap between rote memorization and aesthetic appreciation. To address this, we propose CP-DUE (Classical Poem – Deep Understanding Evaluation), a top-down framework that treats poetry comprehension as a five-level progressive process. CP-DUE systematically evaluates LLMs across dimensions ranging from basic cultural facts to precise word choice (\textit{Tui Qiao}), hidden allusions, and overall aesthetic appreciation. Through extensive experiments comparing LLMs with human experts, we reveal that even advanced models struggle with the artistic nuances that define the soul of ACP. This work provides new insights into bridging the understanding gap and enhancing LLMs' competence in genuine cultural connection.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: evaluation

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: Chinese

Submission Number: 7268

Loading