MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

MCGA: A Multi-task Classical Chinese Literary Genre Audio Corpus

ACL ARR 2026 January Submission9739 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: audio corpus, classical Chinese literature audio, evaluation metric

Abstract: With the rapid advancement of Multimodal Large Language Models (MLLMs), their potential has garnered significant attention in Chinese Classical Studies (CCS). While existing research has primarily focused on text and visual modalities, the audio corpus within this domain remains largely underexplored. To bridge this gap, we propose the \textbf{Multi-task Classical Chinese Literary Genre Audio Corpus (MCGA)}. It encompasses a diverse range of literary genres across six specialized audio tasks: Automatic Speech Recognition (ASR), Speech-to-Text Translation (S2TT), Speech Emotion Captioning (SEC), Spoken Question Answering (SQA), Speech Understanding, and Speech Reasoning. Through the evaluation of ten MLLMs, our experimental results demonstrate that current models still face substantial challenges when processed on the MCGA test set. Furthermore, we introduce an evaluation metric for SEC and a metric to measure the consistency between the speech and text capabilities of MLLMs. We will release MCGA and our code to the public to facilitate the development of MLLMs with more robust multidimensional audio capabilities in CCS.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation, NLP datasets, metrics

Contribution Types: NLP engineering experiment, Data resources, Data analysis

Languages Studied: Chinese, English

Submission Number: 9739

Loading