Character Beyond Speech: Leveraging Role-Playing Evaluation in Large Audio Language Models via Reinforcement Learning
Keywords: Role-Playing Language Agents, Large Audio Language Models, Reinforcement Learning
Abstract: The advancement of multimodal large model technology has propelled the simulation of diverse characters in speech dialogue systems, establishing a novel interactive paradigm. Character attributes are manifested not only in textual responses but also through vocal features, with speech containing non-semantic information that is challenging to quantify. This poses significant difficulties in evaluating the character embodiment capabilities of role-playing agents. In response to these issues, we present the RoleJudge evaluation framework, which leverages audio large language models to systematically assess the alignment between speech and character across multiple modalities and dimensions. Furthermore, we introduce RoleChat, the first role-playing speech evaluation dataset, comprising both authentic speech samples and detailed reasoning annotations for evaluation. Utilizing this dataset, we implement a multi-stage training paradigm and incorporate standard alignment in reinforcement learning to mitigate reward misalignment during the optimization process. Experimental results on both accuracy and subjective assessment demonstrate that RoleJudge outperforms various baseline models, thereby validating the effectiveness of our multidimensional evaluation framework.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 25258
Loading