Creativity Coverage: Human-Grounded Boundaries for Evaluating LLM Creativity

Creativity Coverage: Human-Grounded Boundaries for Evaluating LLM Creativity

ACL ARR 2026 January Submission6956 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI creativity, evaluation methodology, human-grounded evaluation

Abstract: We introduce creativity coverage, a novel framework for evaluating large language model (LLM) creativity as a boundary rather than a scalar. Unlike existing methods that measure proximity to human creative standards, our approach identifies hard limits: which regions of human creative space can LLMs reach, and which remain beyond their grasp? This formulation aligns with theories of transformational creativity, which emphasize moving beyond known conceptual boundaries rather than performing well within them. We define human creativity boundaries using the distribution of human responses in a shared semantic embedding space, then measure LLM coverage over this space. Across divergent thinking, convergent reasoning, and creative writing tasks, we find that creative boundaries are strongly task-dependent: models achieve high coverage on structured tasks but occupy only a narrow subset of human space in open-ended writing. Our metric correlates with established diversity measures yet provides distinct information. We further identify specific linguistic features—narrative length, lexical specificity, novel entities—that characterize human creativity beyond model reach, offering actionable insights for improving LLM creative capabilities.

Paper Type: Long

Research Area: Natural Language Generation

Research Area Keywords: analysis, automatic evaluation

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 6956

Loading