Abstract: We introduce the task of visual creative description (VCD) and propose three key innovations: 1) the C-CoC framework for structured creative cognition, 2) the PAINT dataset for systematic training and evaluation, and 3) VCD-Bench, the first multidimensional benchmark for visual creativity. Our experiments on 10 models reveal significant limitations—while models excel in spatial reasoning, they struggle with color and plot evaluation, with these gaps remaining across model sizes. These findings suggest the need for architectural innovations beyond simple parameter scaling.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, automatic creation and evaluation of language resources, Cross-modal content generation
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 4539
Loading