What Does a Visual Formal Analysis of the World's 500 Most Famous Paintings Tell Us About Multimodal LLMs?

Published: 19 Mar 2024, Last Modified: 04 May 2024Tiny Papers @ ICLR 2024 NotableEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal large language model, Benchmark, Formal Analysis, Detailed visual processing
Abstract: This work introduces ArtQA, a new benchmark for multimodal LLMs through the lens of formal analysis of paintings. We focus on key elements such as line, shape, space, color, form, value, and texture—collectively referred to as the elements of art in visual formal analysis. ArtQA contains questions spanning 4 metrics, further divided into 16 fine-grained categories. We leverage the power of LLMs to generate VQA questions based on formal analysis of 500 renowned paintings. These questions undergo a rigorous filtering process by both model annotation and human experts, ensuring ArtQA's quality and reliability.
Submission Number: 145
Loading