MEMORIA: A Large Language Model, Instruction Data and Evaluation Benchmark for Intangible Cultural Heritage
Keywords: Intangible Cultural Heritage (ICH), Cultural AI, Instruction Tuning, Large Language Models, Evaluation Benchmark
Abstract: Although large language models (LLMs) have demonstrated remarkable capabilities in natural language processing, there are no publicly available LLMs specifically tailored for Intangible Cultural Heritage (ICH), instruction tuning datasets, or comprehensive evaluation benchmarks, which is critical for advancing the preservation, understanding, and transmission of cultural knowledge through artificial intelligence. This paper introduces MEMORIA, a comprehensive framework for intangible cultural heritage preservation through AI. MEMORIA includes: (1) ICHLLM, the first ICH-specific large language model based on fine-tuning LLaMA (with both 7B and 13B versions) using instruction data; (2) CHIT, a large-scale instruction dataset with 158K data samples covering diverse cultural domains and languages; and (3) ICHEB, the first comprehensive evaluation benchmark with 6 tasks and 13 datasets spanning knowledge understanding, classification, generation, and cross-cultural translation tasks. We first construct the large-scale multi-task instruction dataset CHIT, considering various ICH categories, diverse data formats and multilingual content spanning over 50 languages. Within the MEMORIA framework, we develop ICHLLM by fine-tuning LLaMA (7B and 13B versions) with the constructed dataset to follow instructions for various ICH-related tasks while maintaining cultural sensitivity and accuracy. To support the evaluation of ICH-focused LLMs, we propose a standardized benchmark that covers critical tasks including ICH knowledge question answering, cultural entity recognition, heritage classification, narrative generation, and cross-cultural knowledge translation. With this benchmark, we conduct a comprehensive analysis of ICHLLM and several existing LLMs, revealing their capabilities and limitations in understanding and preserving cultural heritage knowledge. The model, datasets, benchmark, and experimental results will be open-sourced to facilitate future research in cultural AI and digital humanities. Our anonymous code can be available at https://anonymous.4open.science/r/MEMORIA.
Primary Area: datasets and benchmarks
Submission Number: 6042
Loading