Evaluating Creativity in Large Language Models through Creative Problem-Solving: A New Dataset and Benchmark

Evaluating Creativity in Large Language Models through Creative Problem-Solving: A New Dataset and Benchmark

ACL ARR 2024 June Submission117 Authors

06 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Creative problem-solving, integrating divergent and convergent thinking, is pivotal for leveraging creativity in fields such as AI4Science. As large language models (LLMs) evolve into sophisticated creative assistants, it becomes crucial to effectively assess their problem-solving abilities. Traditional benchmarks, often rooted in cognitive science, focus on a single phase or do not distinguish between the divergent and convergent phases, limiting their ability to fully evaluate LLMs. To bridge this gap, we introduce a novel benchmark comprising an open-ended question answering (QA) dataset alongside traditional creativity tasks, aimed at evaluating the holistic creative capabilities of LLMs. This benchmark utilizes multi-dimensional evaluation metrics to provide a comprehensive assessment that correlates with model parameters, architectural differences, and domain-specific expertise. The benchmark aims to not only advance understanding in the field but also set a new standard for evaluating the creative problem-solving potential of LLMs. The dataset and code are available at: https://anonymous.4open.science/r/LLM-creativity-Benchmark/.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Creativity, Creative Problem-Solving

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 117

Loading