Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models
Abstract: The evaluation of LLMs' creativity represents a crucial research domain, though challenges such as data contamination and costly human assessments often impede progress. Drawing inspiration from human creativity assessment, we propose PACE, asking LLMs to generate Parallel Chains of Associations to Evaluate their creativity. PACE is straightforward, cost-effective and minimizes the risk of data contamination, as evidenced by its strong correlation with Arena Creative Writing (Spearman's $\rho = 0.739$, $p < 0.001$) on various proprietary and open-source models. A comparative analysis of associative creativity between LLMs and humans reveals that while high-performing LLMs achieve scores comparable to average human performance, top-performing humans consistently outperform LLMs. Furthermore, linguistic analysis of associative responses shows parallel patterns between humans and LLMs, including similarities in the distribution of association types and a shared trend of decreasing concreteness.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking; evaluation methodologies; evaluation
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 7768
Loading