BoYaEval: Evaluating Multimodal Large Language Models on Understanding Ancient Chinese Musical Scores

BoYaEval: Evaluating Multimodal Large Language Models on Understanding Ancient Chinese Musical Scores

ACL ARR 2026 January Submission9458 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Intangible Cultural Heritage, Symbolic Music Intelligence, Multimodal Large Language Models

Abstract: Large Multimodal Models (LMMs) excel in general tasks but struggle with specialized, structured cultural symbols. We introduce BoYaEval, the first comprehensive benchmark dedicated to deciphering diverse Ancient Chinese musical notations, including five types of ancient Chinese music notation systems. These systems utilize unique spatial layouts and specialized ideograms to encode pitch, and intricate playing techniques. BoYaEval comprises 3,175 high-quality images across these notation styles and establishes a three-tier evaluation: Structural Parsing (symbol recognition), Instructional Translation (technique mapping), and Musical Reasoning (melody derivation). We evaluate 22 leading LMMs. Results indicate that while models perform adequately in basic recognition, they fail significantly in cross-system compositional logic, scoring only around 27\% on reasoning tasks. BoYaEval highlights the limitations of current LMMs in processing diverse spatial-symbolic dependencies, bridging the gap between ancient wisdom and modern AI for digitizing intangible cultural heritage.

Paper Type: Long

Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good

Research Area Keywords: NLP tools for social analysis, quantitative analyses of news and/or social media;

Contribution Types: Data resources

Languages Studied: Chinese

Submission Number: 9458

Loading