AidanBench: Evaluating Novel Idea Generation on Open-Ended Questions

Published: 30 Oct 2024, Last Modified: 13 Dec 2024LanGame PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM creativity; evaluation
Abstract: AidanBench evaluates large language models (LLMs) on their ability to generate novel ideas in response to open-ended questions, focusing on creativity, reliability, contextual attention, and instruction following. Unlike benchmarks with clear-cut answers, AidanBench assesses models in more open-ended, real-world tasks. Testing several state-of-the-art LLMs, it shows weak correlation with existing benchmarks while offering a more nuanced view of their performance in open-ended scenarios.
Submission Number: 43
Loading