MuseBench: A Comprehensive Benchmark for Multimodal Cultural Understanding of Chinese Museum Artifacts

ACL ARR 2026 January Submission5221 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: museum artifact, cultural retrieval, culture VQA, cultural understanding
Abstract: Chinese museum artifacts represent a continuous cultural lineage and constitute a core component of global cultural heritage. Recent advances in Vision–Language models (VLMs) have shown promise in incorporating domain knowledge when applied to such collections. However, it remains unclear to what extent existing VLMs can effectively interpret and reason over professional museum artifact documentation. To address this gap, we introduce MuseBench, a comprehensive benchmark of Chinese museum artifacts, that evaluates two dimensions of VLM's cultural understanding: cultural reasoning and semantic alignments. MuseBench contains 128,592 images of 29,352 artifacts and 293,376 question-answer pairs, supporting two complementary tasks: Cultural Visual Question Answering and Cultural Retrieval. Through extensive evaluation of 25 mainstream VLMs, we observe that even the top-performing model achieves an average score below 22\%, indicating substantial room for improvement. Our analysis shows significant performance variations across different tasks and identifies critical challenges primarily arising from professional terminology generation and structured metadata understanding. MuseBench thus provides a challenging benchmark with valuable insights that reveal substantial room for improvement in multimodal cultural understanding.
Paper Type: Long
Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good
Research Area Keywords: Resources and Evaluation, Computational Social Science and Cultural Analytics, Question Answering
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English, Chinese
Submission Number: 5221
Loading