BoYaEval: Evaluating Multimodal Large Language Models on Understanding Ancient Chinese Musical Scores
Keywords: Intangible Cultural Heritage, Symbolic Music Intelligence, Multimodal Large Language Models
Abstract: Large Multimodal Models (LMMs) excel in general tasks but struggle with specialized, structured cultural symbols. We introduce BoYaEval, the first comprehensive benchmark dedicated to deciphering diverse Ancient Chinese musical notations, including five types of ancient Chinese music notation systems. These systems utilize unique spatial layouts and specialized ideograms to encode pitch, and intricate playing techniques. BoYaEval comprises 3,175 high-quality images across these notation styles and establishes a three-tier evaluation: Structural Parsing (symbol recognition), Instructional Translation (technique mapping), and Musical Reasoning (melody derivation). We evaluate 22 leading LMMs. Results indicate that while models perform adequately in basic recognition, they fail significantly in cross-system compositional logic, scoring only around 27\% on reasoning tasks. BoYaEval highlights the limitations of current LMMs in processing diverse spatial-symbolic dependencies, bridging the gap between ancient wisdom and modern AI for digitizing intangible cultural heritage.
Paper Type: Long
Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good
Research Area Keywords: NLP tools for social analysis, quantitative analyses of news and/or social media;
Contribution Types: Data resources
Languages Studied: Chinese
Submission Number: 9458
Loading