SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent SystemsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Human communication is a complex and diverse process that not only involves multiplefactors such as language, commonsense, andcultural backgrounds but also requires the participation of multimodal information, such asspeech. Large Language Model (LLM)-basedmulti-agent systems have demonstrated promising performance in simulating human society.Can we leverage LLM-based multi-agent systems to simulate human communication? However, current LLM-based multi-agent systemsmainly rely on text as the primary medium.In this paper, we propose SpeechAgents, amulti-modal LLM based multi-agent systemdesigned for simulating human communication. SpeechAgents utilizes multi-modal LLMas the control center for individual agent andemployes multi-modal signals as the mediumfor exchanged messages among agents. Additionally, we propose Multi-Agent Tuning toenhance the multi-agent capabilities of LLMwithout compromising general abilities. Tostrengthen and evaluate the effectiveness ofhuman communication simulation, we buildthe Human-Communication Simulation Benchmark. Experimental results demonstrate thatSpeechAgents can simulate human communication dialogues with consistent content, authentic rhythm, and rich emotions and demonstrateexcellent scalability even with up to 25 agents,which can apply to tasks such as drama creationand audio novels generation. Demos are available at https://speechagents.github.io/.
Paper Type: long
Research Area: Speech recognition, text-to-speech and spoken language understanding
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview