Collaborative Tasks with Heterogenous LLM Students

ACL ARR 2024 June Submission4339 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Advances in LLMs offer hope of corresponding advances in agent participation in teamwork, while also posing new challenges in designing multi-agent benchmarks for evaluating these agents and integrating them effectively into hybrid teams in real-world situations. While prior work has demonstrated that LLMs can operate in multi-agent settings, they often oversimplify the complexity of collaboration in critical dimensions, such as restricting evaluation to in-domain and single episode tasks amongst homogeneous LLM groups. To bridge this gap, we propose a new cooperative multi-agent task, Kitchen-Alien Rush, which includes both out-of-domain multi-episode evaluation, as well as evaluates the effectiveness of hybrid groups in collaboration. Our findings reveal that our evaluation exposes gaps in multi-agent collaboration, as LLM agents struggle to perform in the out-of-domain task and show inconsistent improvement over multiple episodes in hybrid teams. By identifying these gaps, we motivate the need for future work in addressing weaknesses of hybrid multi-agents systems for out-of-domain multi-episode tasks.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: benchmarking, educational applications
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 4339
Loading