Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models

Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models

ACL ARR 2025 May Submission7768 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The evaluation of LLMs' creativity represents a crucial research domain, though challenges such as data contamination and costly human assessments often impede progress. Drawing inspiration from human creativity assessment, we propose PACE, asking LLMs to generate Parallel Chains of Associations to Evaluate their creativity. PACE is straightforward, cost-effective and minimizes the risk of data contamination, as evidenced by its strong correlation with Arena Creative Writing (Spearman's $\rho = 0.739$, $p < 0.001$) on various proprietary and open-source models. A comparative analysis of associative creativity between LLMs and humans reveals that while high-performing LLMs achieve scores comparable to average human performance, top-performing humans consistently outperform LLMs. Furthermore, linguistic analysis of associative responses shows parallel patterns between humans and LLMs, including similarities in the distribution of association types and a shared trend of decreasing concreteness.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking; evaluation methodologies; evaluation

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 7768

Loading