AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

ACL ARR 2026 January Submission5947 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Audio Role-Playing, Large Language Models, Multimodal Dataset, Data Construction

Abstract: While existing role-playing research predominantly focuses on text, Audio Role-Playing (ARP) presents unique challenges regarding the synchronized alignment of semantic content and vocal characteristics. To address this gap, we propose AudioRole, a meticulously curated dataset from 13 TV series spanning 1K+ hours with 1M+ character-grounded dialogues, providing synchronized audio-text pairs annotated with speaker identities and contextual metadata. In addition, to demonstrate the effectiveness of the dataset, we introduced ARP-Eval, a dual-aspect evaluation framework that assesses both response quality and role fidelity. Empirical validation showing GLM-4-Voice trained on AudioRole (called ARP-Model) achieves an average Acoustic Personalization score of 0.31, significantly outperforming the original GLM-4-voice and the more powerful model MiniCPM-O-2.6. The ARP-Model also achieves a Content Personalization score of 0.36, surpassing the untrained original model by about 38%. The blind human perceptual evaluation also confirms these findings. AudioRole features dialogues from over 115 main characters, 6 trained ARP-Models, and evaluation protocols. Together, they provide an essential resource for advancing audio-grounded role-playing research.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond, Speech Recognition, Text-to-Speech and Spoken Language Understanding

Contribution Types: Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English

Submission Number: 5947

Loading